Skip to main content

Developer Protocol

Access our high-throughput Sovereign Intelligence network via a unified API gateway. Designed for enterprise-scale inference and hyper-minimal latency.

Quick Start Initialization

The VORTEX AI API follows an OpenAI-compatible architecture. You can drop our Intelligence Layer into any existing codebase with just two lines of configuration change.

Python SDK
import openai client = openai.OpenAI( api_key="VORTEX_ACCESS_KEY", base_url="https://api.vortexaillm.com/v1" ) response = client.chat.completions.create( model="vortex-intelligence-pro", messages=[{"role": "user", "content": "Analyze node health."}] )

Intelligence Endpoints

All traffic is routed through our sovereign edge network. We support streaming responses and function calling across all primary model nodes.

POST /v1/chat/completions
GET /v1/models
POST /v1/embeddings

Security & Rate Protocols

Free Tier access is restricted to 1,000 tokens per minute. For Enterprise throughput levels, please review your active contracts in the settings panel or contact support.

Smartness, fine-tuning, and “skills”

Routing and behavior without training: Most of what operators call “smartness” comes from the gateway stack (litellm_config.yaml, model choice, temperature, max tokens, and system prompts your app sends). Tighten prompts, constrain output formats (JSON schema or bullet lists), and pick the smallest model that clears your quality bar—latency and cost follow that decision.

Fine-tuning: Fine-tuning is done on your chosen base model through your provider or pipeline (training data, eval sets, then a dedicated deployment or LoRA). Point LiteLLM at the new model name once it is live; no change is required on this static site beyond docs and default Chat models.

Skills (tools): “Skills” map to tool-calling: define functions the model can invoke (HTTP to your APIs, DB lookups, RAG retrieval). Enable tools in LiteLLM for the models you expose, validate arguments server-side, and return concise results back into the chat loop. That pattern beats stuffing long context windows and is how production agents stay reliable.