Simple Guide to Convert an FP16 Model to FP8 Overview This simple guide to quant models walks you through converting a model from FP16 to FP8, an 8-bit data format that significantly improves model inference efficiency without sacrificing output quality. FP8 is ideal for quantizing large language models (LLMs), ensuring faster and more cost-effective deployments. […]
Category Archives: Guides
SillyTavern is one of the most popular interfaces to interact with LLMs. We have been working on developing an API and one of the first interfaces we wanted to integrate with was SillyTavern. We have done just that. Requirements: Infermatic.ai Plus Tier subscription ($15/month) Steps to integrate: After you subscribe to Infermatic.ai you can generate […]