Intro to Embedding Models

What Are Embedding Models?

Embedding models convert text into numbers—specifically, high-dimensional vectors (arrays of numbers). These vectors capture the semantic meaning of the text so that similar texts have similar vector representations.

You can store and compare these vectors using methods like cosine similarity to find similar content.


API Usage

Endpoint

POST https://api.totalgpt.ai/v1/embeddings

Required Headers

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Request Body

{
  "input": ["Hello world"],
  "model": "intfloat-multilingual-e5-base"
}

Example cURL Request

curl -X POST https://api.totalgpt.ai/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world"],
    "model": "intfloat-multilingual-e5-base"
  }'

What Are Embeddings Used For?

Embeddings are powerful tools in various AI and NLP use cases:

  • Semantic Search – Find similar content, FAQs, or documents.
  • Chatbots & QA Systems – Retrieve relevant knowledge for better answers.
  • Clustering & Classification – Group similar texts without relying on keywords.
  • Multilingual Applications – Support cross-language text matching.

 


Embeddings in RAG (Retrieval-Augmented Generation)

RAG is a technique that improves generative models by combining them with search. Here’s how embeddings fit into the RAG cycle:

RAG Pipeline Overview

  1. Embed your documents
    → Use the /embeddings endpoint to convert your content into vectors.
  2. Store embeddings in a vector database
    → Use tools like Pinecone, Weaviate, or Faiss.
  3. Embed the user query
    → Use the same endpoint and model.
  4. Search for the most relevant documents
    → Compare the query vector to your stored vectors using cosine similarity.
  5. Pass the top documents to a language model
    → The model generates an answer using the retrieved context.


Example Use Case

User question:

“How do I reset my password?”

System process:

  • Query is embedded and matched against a vector database.
  • Top related document (e.g., “Password reset instructions”) is retrieved.
  • That document is passed to a language model to generate a helpful answer.

 

Response

The API returns a JSON object containing the embedding vector for the input text:

{
  "model": "intfloat/multilingual-e5-base",
  "data": [
    {
      "embedding": [0.0185, 0.0364, -0.0030, ...],
      "index": 0,
      "object": "embedding"
    }
  ],
  "object": "list",
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 4
  }
}

Note: The full embedding vector is typically 768 dimensions long.