Model Providers
The model provider system manages LLM models across multiple backends. The DynamicModelRegistry auto-discovers Spring AI ChatModel beans and custom ModelProvider implementations, while the DynamicModelSelectionStrategy resolves the runtime model for each request from explicit model hints, tier mapping, availability, and configured fallbacks.
Overview
Contexa supports multiple LLM providers simultaneously. At startup, the DynamicModelRegistry discovers all available models through three mechanisms:
- ModelProvider beans — custom provider implementations registered in the Spring context.
- Spring AI ChatModel beans — auto-detected
ChatModelbeans, typically created fromcontexa.llm.chat.ollama.*,spring.ai.anthropic.*, orspring.ai.openai.*configuration. - TieredLLMProperties — model IDs from the configured layer-1 and layer-2 hierarchy.
The registry automatically infers the provider from the model class name (Ollama, Anthropic, OpenAI, Gemini, Mistral, Azure, Bedrock, HuggingFace) and performs health checks at initialization.
ModelProvider
The interface for custom model provider implementations. Implement this to add support for providers not covered by Spring AI auto-configuration.
public interface ModelProvider
ChatModel instance for the given descriptor. The config map provides runtime overrides.ModelDescriptor
Describes a model's identity, capabilities, default options, and current status.
@Data @Builder
public class ModelDescriptor
| Property | Type | Description |
|---|---|---|
modelId | String | Unique model identifier (e.g., "llama3.1:8b", "claude-3-opus"). |
displayName | String | Human-readable model name. |
provider | String | Provider name (ollama, anthropic, openai). |
tier | Integer | The configured runtime tier assigned to the model (1 or 2). Null if unassigned. |
version | String | Model version string. |
capabilities | ModelCapabilities | What the model supports (streaming, tool calling, multimodal, context window, output budget). |
options | ModelOptions | Default sampling options (temperature, topP, topK, repetitionPenalty). |
status | ModelStatus | AVAILABLE or UNAVAILABLE. |
ModelCapabilities
| Field | Type | Default |
|---|---|---|
streaming | boolean | true |
toolCalling | boolean | false |
functionCalling | boolean | false |
multiModal | boolean | false |
maxTokens | int | 4096 |
contextWindow | int | 4096 |
maxOutputTokens | int | 4096 |
ModelOptions
| Field | Type | Default |
|---|---|---|
temperature | Double | 0.7 |
topP | Double | 0.9 |
topK | Integer | null |
repetitionPenalty | Double | 1.0 |
DynamicModelRegistry
Central registry that discovers, manages, and provides access to all LLM models. Auto-initializes at application startup.
public class DynamicModelRegistry
ChatModel for the given ID. Creates and caches the instance if not already loaded. Throws ModelSelectionException if not found.ChatModel.AVAILABLE model descriptors filtered by provider name.shutdown() on registered providers during application shutdown.ModelSelectionStrategy
Interface for model selection logic. The DynamicModelSelectionStrategy is the default implementation.
public interface ModelSelectionStrategy
DynamicModelSelectionStrategy
The default selection strategy uses an explicit-model-first fallback chain:
public class DynamicModelSelectionStrategy implements ModelSelectionStrategy
- Explicit model request —
ExecutionContext.preferredModelor metadata keys such asrequestedModelId,preferredModel,runtimeModelId, andmodelIdare tried first. - Tier resolution — If no explicit model is present, the strategy resolves a semantic tier from
analysisLevel, thensecurityTaskType, thenExecutionContext.tier. Values above 1 are normalized to configured layer 2. - Primary/backup lookup — The strategy uses
TieredLLMPropertiesto try the primary model for the resolved layer and then its backup model. - Primary ChatModel fallback — If tier resolution fails, the auto-configured primary
ChatModelbean is used as the last fallback.
Configuration
Tiered Model Hierarchy
spring:
ai:
security:
layer1:
model: qwen2.5:14b
backup:
model: qwen2.5:7b
layer2:
model: exaone3.5:latest
backup:
model: llama3.1:8b
tiered:
layer1:
timeout-ms: 30000
layer2:
timeout-ms: 60000
contexa:
llm:
chat:
ollama:
base-url: http://localhost:11434
model: qwen2.5:14b
Selection order: explicit model request → analysisLevel → securityTaskType → explicit tier → primary ChatModel fallback. The runtime only has configured layers 1 and 2, so higher semantic tiers are normalized to layer 2.