Tier System Demonstration

Instructions: Open the browser console (F12 or Ctrl+Shift+J) to see the demonstration results. This page runs the tier-demo.js script which demonstrates the tier system functionality.

Tier System Overview

This demonstration shows how the tier system works for an inference service, including:

Usage-based pricing with variable markup (1.03-1.20 per 1M tokens)
Rate limiting based on RPM and TPM
Model access controls based on tier level
Infrastructure routing between Tela Inference As Service and dedicated GPUs
Dynamic infrastructure sizing based on TPM requirements

Available Tiers

Tier	Description	Pricing	Rate Limits	Infrastructure
Basic	For small applications	$10/month minimum + $1.20 per 1M tokens (20% markup) TPM: 50K = ~1 Tela Inference As Service instance	500 RPM / 50K TPM	Tela Inference As Service
Standard	For medium applications	$100/month minimum + $1.15 per 1M tokens (15% markup) TPM: 200K = ~1-2 Tela Inference As Service instances	2K RPM / 200K TPM	Tela Inference As Service (priority)
Professional	For high-volume applications	$500/month minimum + $1.10 per 1M tokens (10% markup) TPM: 5M = ~2-4 H100 GPUs	5K RPM / 1M TPM	Mixed (Tela Inference As Service + Dedicated)
Enterprise	For large enterprises	$2K/month minimum + $1.05 per 1M tokens (5% markup) TPM: 10M = ~4-8 H100 GPUs	10K RPM / 5M TPM	Dedicated with Tela Inference As Service backup
Enterprise Plus	Custom enterprise solution	$5K/month minimum + $1.03 per 1M tokens (3% markup) TPM: 150M = ~12-24 H100 GPUs	30K RPM / 10M TPM	Dedicated reserved capacity

Model Access by Tier

Model	Basic	Standard	Professional	Enterprise	Enterprise Plus
Mistral-7B	✓	✓	✓	✓	✓
Llama-3.1-8B	✓	✓	✓	✓	✓
Llama-3.1-70B	✓	✓	✓	✓	✓
Llama-3.1-405B	✗	✓	✓	✓	✓
DeepSeek-R1-671B	✗	✓	✓	✓	✓