Models Endpoint #
Request: #
GET https://api.totalgpt.ai/v1/models
Headers #
{ "Authorization": "Bearer YOUR_API_KEY" }
Response #
{ "data": [ { "id": "Sao10K-L3.3-70B-Euryale-v2.3-FP8-Dynamic", "object": "model", "created": 1677610602, "owned_by": "openai" }, ... ]
#
Chat Completions #
Given a list of messages comprising a conversation, the model will return a response. Customize the output by adjusting parameters like length, creativity, and repetition control.
Endpoints #
Request to a Model #
Request: #
POST https://api.totalgpt.ai/v1/chat/completions
Headers: #
{ "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }
#
Example Request (JSON): #
{
“model”:”Sao10K-72B-Qwen2.5-Kunou-v1-FP8-Dynamic”,
“messages”:[
{“role”:”system”,”content”:”You are a helpful assistant who answers concisely.”},
{“role”:”user”,”content”:”Hello!”}
],
“max_tokens”:7000,
“temperature”:0.7,
“top_k”:40,
“repetition_penalty”:1.2
}
Handling Errors for Unsupported System Prompts #
This error occurs when a system prompt is not supported by the model you are using. If you encounter this issue, it typically means the prompt you’ve submitted cannot be processed by the following models:
- TheDrummer-Rocinante-12B-v1.1
- Midnight-Miqu-70B-v1.5
- TheDrummer-Anubis-70B-v1-FP8-Dynamic
{
“error”:{
“message”:”litellm.APIError: APIError: OpenAIException – Internal Server Error\nReceived Model Group=Midnight-Miqu-70B-v1.5\nAvailable Model Group Fallbacks=None”,
“type”:null,
“param”:null,
“code”:”500″
}
}
#
Text Completions #
The Text Completions endpoint generates text based on your input prompt. Customize the output by adjusting parameters like length, creativity, and repetition control.
Request to a Model #
Request: #
POST https://api.totalgpt.ai/v1/completions
Headers: #
{ "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }
#
Example Request (JSON): #
{ "model": "TheDrummer-UnslopNemo-12B-v4.1", "prompt": "Generate a poem about programming", "max_tokens": 7000, "temperature": 0.7, "top_k": 40, "repetition_penalty": 1.2 }
#
Supported Parameters: #
-
- n: Number of output sequences to return for the given prompt.
- best_of: Number of output sequences generated from the prompt. From these, the top n sequences are returned. Must be greater than or equal to n.
- presence_penalty: Float penalizing new tokens based on appearance in the generated text.
- frequency_penalty: Float penalizing new tokens based on their frequency in the generated text.
- repetition_penalty: Float penalizing new tokens based on appearance in the prompt and generated text.
- temperature: Controls randomness; lower values make it deterministic.
- top_p: Cumulative probability of top tokens to consider. Set to 1 to consider all tokens.
- top_k: Number of top tokens to consider. Set to -1 to consider all tokens.
- min_p: Minimum probability for a token to be considered relative to the most likely token.
- seed: Random seed for generation.
- stop: List of strings that stop the generation.
- max_tokens: Maximum number of tokens to generate per output sequence.
- min_tokens: Minimum number of tokens to generate before EOS or stop_token_ids.
- detokenize: Whether to detokenize the output.
- skip_special_tokens: Whether to skip special tokens in the output.
#
Token Counting #
The Token Counting endpoint helps manage API usage by calculating the number of tokens in your input text.
Request: #
POST https://api.totalgpt.ai/utils/token_counter
{ "model":"L3.1-70B-Euryale-v2.2-FP8-Dynamic", "prompt":"Your prompt" }
Headers: #
{ "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }
Response: #
{ “total_tokens”:22, “request_model”:”L3.1-70B-Euryale-v2.2-FP8-Dynamic”, “model_used”:”Infermatic/L3.1-70B-Euryale-v2.2-FP8-Dynamic”, “tokenizer_type”:”huggingface_tokenizer” }
#
Additional Information #
-
- Community Support: Join our Discord server to engage with other developers and share feedback.
- Model Hosting: Models are hosted using the efficient vLLM backend. Learn more in the vLLM Documentation.