GPU / CPU
Fastino offers two types of model-serving endpoints—CPU and GPU—designed to support different usage patterns and performance needs.
CPU Endpoints
CPU endpoints are the default option for Build (Free) tier users and provide a cost-effective, energy efficient environment for lightweight, experimental, or offline workloads where latency or throughput are less critical.
Key details:
✅ Available on Build, Pro, and Team plans
✅ Energy efficient
❌ Batching not supported – only one input per request is allowed
GPU Endpoints
GPU endpoints deliver high-throughput, low-latency performance and are exclusively available to Pro and Team users. They’re ideal for real-time production use cases and high-volume processing.
Key details:
✅ Available only on Pro and Team plans
✅ Fully supports batching and single input requests
✅ Lower latency
✅ Higher throughput
Last updated