Text Completions #
The Text Completions endpoint generates text based on your input prompt. Customize the output by adjusting parameters like length, creativity, and repetition control.
Endpoints #
Models Endpoint #
Request: #
GET https://api.totalgpt.ai/v1/models
Headers #
{ "Authorization": "Bearer YOUR_API_KEY" }
Response #
{ "data": [ { "id": "Sao10K-L3.3-70B-Euryale-v2.3-FP8-Dynamic", "object": "model", "created": 1677610602, "owned_by": "openai" }, ... ]
#
Request to a Model #
Request: #
POST https://api.totalgpt.ai/v1/completions
Headers: #
{ "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }
#
Example Request (JSON): #
{ "model": "TheDrummer-UnslopNemo-12B-v4.1", "prompt": "Generate a Bible question with four multiple-choice answers (one correct, three incorrect).", "max_tokens": 7000, "temperature": 0.7, "top_k": 40, "repetition_penalty": 1.2 }
#
Example Request (cURL): #
curl -X POST https://api.totalgpt.ai/v1/TheDrummer-UnslopNemo-12B-v4.1/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Generate a Bible question with four multiple-choice answers (one correct, three incorrect).", "max_tokens": 7000, "temperature": 0.7, "top_k": 40, "repetition_penalty": 1.2 }'
#
Supported Parameters: #
-
- n: Number of output sequences to return for the given prompt.
- best_of: Number of output sequences generated from the prompt. From these, the top n sequences are returned. Must be greater than or equal to n.
- presence_penalty: Float penalizing new tokens based on appearance in the generated text.
- frequency_penalty: Float penalizing new tokens based on their frequency in the generated text.
- repetition_penalty: Float penalizing new tokens based on appearance in the prompt and generated text.
- temperature: Controls randomness; lower values make it deterministic.
- top_p: Cumulative probability of top tokens to consider. Set to 1 to consider all tokens.
- top_k: Number of top tokens to consider. Set to -1 to consider all tokens.
- min_p: Minimum probability for a token to be considered relative to the most likely token.
- seed: Random seed for generation.
- stop: List of strings that stop the generation.
- bad_words: List of words not allowed to be generated.
- max_tokens: Maximum number of tokens to generate per output sequence.
- min_tokens: Minimum number of tokens to generate before EOS or stop_token_ids.
- detokenize: Whether to detokenize the output.
- skip_special_tokens: Whether to skip special tokens in the output.
#
Token Counting #
The Token Counting endpoint helps manage API usage by calculating the number of tokens in your input text.
Request: #
POST https://api.totalgpt.ai/utils/token_counter
{ "model":"L3.1-70B-Euryale-v2.2-FP8-Dynamic", "prompt":"Your prompt" }
Headers: #
{ "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }
Response: #
{ “total_tokens”:22, “request_model”:”L3.1-70B-Euryale-v2.2-FP8-Dynamic”, “model_used”:”Infermatic/L3.1-70B-Euryale-v2.2-FP8-Dynamic”, “tokenizer_type”:”huggingface_tokenizer” }
#
Additional Information #
-
- Community Support: Join our Discord server to engage with other developers and share feedback.
- Model Hosting: Models are hosted using the efficient vLLM backend. Learn more in the vLLM Documentation.