Artificial IntelligenceMay 18, 2026

The Mathematics of Text Embeddings: High-Dimensional Semantic Vector Spaces

By LLM Practitioner

To an AI model, words are not letters—they are coordinates in a high-dimensional mathematical space. When a model processes a block of text, it maps sentences to a dense vector (an array of numbers, typically ranging from 768 to 3072 dimensions depending on the embedding model). In this high-dimensional space, the absolute direction of the vector represents its semantic meaning. Words or sentences with similar meanings are mapped to vectors that point in nearly identical directions.

To determine how similar two sentences are, we compute the angle between their vectors using a metric called Cosine Similarity. Mathematically, the cosine similarity of two vectors, A and B, is calculated as the dot product of the vectors divided by the product of their magnitudes: `similarity = (A · B) / (||A|| * ||B||)`. The result is a value between -1 and 1. A similarity of 1 means the vectors are pointing in the exact same direction, representing near-identical semantic content, while a similarity of 0 represents orthogonal, unrelated meanings.

In TellPDF, this mathematical mechanism powers the Document QA feature. When you ask a question like 'what is the termination clause in this contract?', we first generate the query's vector embedding. We then compare it against the pre-computed embeddings of all text segments in the PDF. The segments with the highest cosine similarity are extracted and fed directly to the Gemini LLM as context. This process, known as Retrieval-Augmented Generation (RAG), allows the model to answer questions accurately without needing to read the entire document for every turn.

TellPDF

The privacy-first AI document workspace. Your files never leave your computer.

PDF Tools

Company

Legal

Demo Disclaimer:This application is a technology demonstration. While all file processing happens securely in your local browser and documents are never uploaded to any server, this software is provided "as is". Please do not use it for highly sensitive or legally binding documents.

© 2026 TellPDF. All rights reserved.