Skip to content
Infermatic.aiInfermatic.ai
  • Models
  • Pricing
  • API
  • Contact
  • My Account

API

  • Infermatic API Documentation
  • API Key
  • Request limits
View Categories
  • Home
  • Docs
  • API
  • Infermatic API Documentation

Infermatic API Documentation

Models Endpoint (LLM) #

Request: #

GET https://api.totalgpt.ai/v1/models

Headers #

{
"Authorization": "Bearer YOUR_API_KEY"
}

Response #

{ 
"data": [
 { 
"id": "Sao10K-L3.3-70B-Euryale-v2.3-FP8-Dynamic", 
"object": "model", 
"created": 1677610602, 
"owned_by": "openai" 
}, 
... 
]

#

Chat Completions #

Given a list of messages comprising a conversation, the model will return a response. Customize the output by adjusting parameters like length, creativity, and repetition control.

Endpoints #

Request to a Model #

Request: #

POST https://api.totalgpt.ai/v1/chat/completions

Headers: #

{ 
"Authorization": "Bearer YOUR_API_KEY", 
"Content-Type": "application/json"
 }

#

Example Request (JSON): #

{
“model”:”Sao10K-72B-Qwen2.5-Kunou-v1-FP8-Dynamic”,
“messages”:[
{“role”:”system”,”content”:”You are a helpful assistant who answers concisely.”},
{“role”:”user”,”content”:”Hello!”}
],
“max_tokens”:7000,
“temperature”:0.7,
“top_k”:40,
“repetition_penalty”:1.2
}

 

Handling Errors for Unsupported System Prompts #

This error occurs when a system prompt is not supported by the model you are using. If you encounter this issue, it typically means the prompt you’ve submitted cannot be processed by the following models:

  • TheDrummer-Rocinante-12B-v1.1
  • Midnight-Miqu-70B-v1.5
  • TheDrummer-Anubis-70B-v1-FP8-Dynamic
{
“error”:{
“message”:”litellm.APIError: APIError: OpenAIException – Internal Server Error\nReceived Model Group=Midnight-Miqu-70B-v1.5\nAvailable Model Group Fallbacks=None”,
“type”:null,
“param”:null,
“code”:”500″
}
}

#

Text Completions #

The Text Completions endpoint generates text based on your input prompt. Customize the output by adjusting parameters like length, creativity, and repetition control.

Request to a Model #

Request: #

POST https://api.totalgpt.ai/v1/completions

Headers: #

{ 
"Authorization": "Bearer YOUR_API_KEY", 
"Content-Type": "application/json"
 }

#

Example Request (JSON): #

{
"model": "TheDrummer-UnslopNemo-12B-v4.1",
"prompt": "Generate a poem about programming",
"max_tokens": 7000,
"temperature": 0.7,
"top_k": 40,
"repetition_penalty": 1.2
}

#

Supported Parameters: #

    • n: Number of output sequences to return for the given prompt.
    • best_of: Number of output sequences generated from the prompt. From these, the top n sequences are returned. Must be greater than or equal to n.
    • presence_penalty: Float penalizing new tokens based on appearance in the generated text.
    • frequency_penalty: Float penalizing new tokens based on their frequency in the generated text.
    • repetition_penalty: Float penalizing new tokens based on appearance in the prompt and generated text.
    • temperature: Controls randomness; lower values make it deterministic.
    • top_p: Cumulative probability of top tokens to consider. Set to 1 to consider all tokens.
    • top_k: Number of top tokens to consider. Set to -1 to consider all tokens.
    • min_p: Minimum probability for a token to be considered relative to the most likely token.
    • seed: Random seed for generation.
    • stop: List of strings that stop the generation.
    • max_tokens: Maximum number of tokens to generate per output sequence.
    • min_tokens: Minimum number of tokens to generate before EOS or stop_token_ids.
    • detokenize: Whether to detokenize the output.
    • skip_special_tokens: Whether to skip special tokens in the output.

#

Token Counting #

The Token Counting endpoint helps manage API usage by calculating the number of tokens in your input text.

Request: #

POST https://api.totalgpt.ai/utils/token_counter
{

"model":"L3.1-70B-Euryale-v2.2-FP8-Dynamic",
"prompt":"Your prompt"

}

Headers: #

{ 
"Authorization": "Bearer YOUR_API_KEY",
 "Content-Type": "application/json" 
}

Response: #

{
“total_tokens”:22,
“request_model”:”L3.1-70B-Euryale-v2.2-FP8-Dynamic”,
“model_used”:”Infermatic/L3.1-70B-Euryale-v2.2-FP8-Dynamic”,
“tokenizer_type”:”huggingface_tokenizer”
}

Models Endpoint (Embedding) #

POST https://api.totalgpt.ai/v1/embeddings

Headers: #

{ 
"Authorization": "Bearer YOUR_API_KEY",
 "Content-Type": "application/json" 
}

Request Body: #

{
"input": ["Hello world"],
"model": "intfloat-multilingual-e5-base"
}

Response: #

{
"model": "intfloat/multilingual-e5-base",
"data": [
 {
   "embedding": [0.0185, 0.0364, -0.0030, ...],
   "index": 0,
   "object": "embedding"
 }
],
"object": "list",
"usage": {
"prompt_tokens": 4,
"total_tokens": 4
}
}

Additional Information #

    • Community Support: Join our Discord server to engage with other developers and share feedback.
    • Model Hosting: Models are hosted using the efficient vLLM backend. Learn more in the vLLM Documentation.
What are your Feelings
Share This Article :
  • Facebook
  • X
  • LinkedIn
  • Pinterest
Still stuck? How can we help?

How can we help?

Updated on April 15, 2025
API Key
Table of Contents
  • Models Endpoint (LLM)
  • Request:
  • Headers
  • Response
  • Chat Completions
  • Endpoints
  • Request to a Model
    • Request:
    • Headers:
    • Example Request (JSON):
    • Handling Errors for Unsupported System Prompts
  • Text Completions
  • Request to a Model
    • Request:
    • Headers:
    • Example Request (JSON):
    • Supported Parameters:
  • Token Counting
    • Request:
    • Headers:
    • Response:
    • Models Endpoint (Embedding)
    • Headers:
    • Request Body:
    • Response:
  • Additional Information
Home
Pricing Plans
Privacy and Open Use
Models
Model Settings
API Docs
Contact: Discord
Contact: Geek to Geek
Contact: Send Message
Get Started for free
Terms & Conditions
Privacy Policy

Copyright 2025 Infermatic. All rights reserved.

Copyright 2025 © Flatsome Theme
  • Models
  • Pricing
  • API
  • Contact
  • My Account