Model Providers
The model provider system manages LLM models across multiple runtimes. The DynamicModelRegistry registers models from the LlmRuntimeCatalog chat bindings and the configured TieredLLMProperties layers, while the DynamicModelSelectionStrategy resolves the runtime model for each request from explicit model hints, tier mapping, and a primary ChatModel fallback.
Overview
Contexa supports multiple LLM runtimes simultaneously. At startup, the DynamicModelRegistry populates its descriptor map through two mechanisms:
- LlmRuntimeCatalog chat bindings — every
LlmRuntimeBindingreturned fromLlmRuntimeCatalog.getChatBindings()is registered. The catalog is built from Spring AIChatModelbeans created by the Contexa runtime (for example viacontexa.llm.chat.ollama.*) and by Spring AI autoconfigurations (Anthropic, OpenAI, and others). - TieredLLMProperties layers — the primary and optional backup models configured for
spring.ai.security.layer1andspring.ai.security.layer2are registered with their tier. The registry usesresolveCanonicalModelId(...)so the same model can be found by its runtime ID, bean name, or any alias.
The registry infers the provider name from the model ID when no binding is available. Recognised substrings include llama, qwen, gemma, mistral, phi, exaone, codellama, and deepseek (→ ollama), claude (→ anthropic), gpt/o1/davinci (→ openai), gemini/vertex (→ gemini), and bedrock (→ bedrock). Anything else is labelled spring.
LlmRuntimeCatalog
The interface that exposes discovered chat and embedding runtimes. The registry uses it to find bindings and resolve live ChatModel instances on demand.
public interface LlmRuntimeCatalog
ChatModel for the matching binding.EmbeddingModel for the matching binding.ChatModel bean when one is present in the context.EmbeddingModel bean when one is present.LlmRuntimeBinding
An immutable record of a single runtime binding. Bindings are created by Contexa's LLM autoconfiguration and consumed by both the catalog and the registry.
public final class LlmRuntimeBinding
| Property | Type | Description |
|---|---|---|
runtimeId | String | Contexa runtime ID assigned to this binding. |
beanName | String | Spring bean name of the underlying model bean. |
provider | String | Provider identifier (for example, ollama, anthropic, openai). |
modelId | String | Canonical model ID reported by the runtime (for example, llama3.1:8b). |
aliases | Set<String> | Additional selectors that resolve to the same binding. |
type | LlmRuntimeType | Chat or embedding runtime type. |
primary | boolean | Whether this binding is marked as the Spring primary for its type. |
source | String | Origin of the binding (for example, autoconfiguration class or configuration key). |
Helper Methods
| Method | Return | Description |
|---|---|---|
canonicalId() | String | Returns the preferred identifier for the binding: modelId, then runtimeId, then beanName. |
matches(String selector) | boolean | Returns whether the selector matches runtimeId, beanName, modelId, or any alias. |
ModelDescriptor
Describes a model's identity, capabilities, default options, and current status.
@Data @Builder
public class ModelDescriptor
| Property | Type | Description |
|---|---|---|
modelId | String | Unique model identifier (for example, llama3.1:8b, claude-3-opus). |
displayName | String | Human-readable model name. |
provider | String | Provider name (ollama, anthropic, openai). |
tier | Integer | The configured runtime tier assigned to the model (1 or 2). Null if unassigned. |
version | String | Model version string. |
capabilities | ModelCapabilities | Nested class describing streaming, tool calling, multimodal, context window, and output budget. |
options | ModelOptions | Nested class describing default sampling options (temperature, topP, topK, repetitionPenalty). |
status | ModelStatus | AVAILABLE or UNAVAILABLE. |
ModelCapabilities (nested class)
| Field | Type | Default |
|---|---|---|
streaming | boolean | true |
toolCalling | boolean | false |
functionCalling | boolean | false |
multiModal | boolean | false |
maxTokens | int | 4096 |
contextWindow | int | 4096 |
maxOutputTokens | int | 4096 |
ModelOptions (nested class)
| Field | Type | Default |
|---|---|---|
temperature | Double | 0.7 |
topP | Double | 0.9 |
topK | Integer | null |
repetitionPenalty | Double | 1.0 |
Helper Method
true when the capabilities declare tool calling, function calling, or multimodal support.DynamicModelRegistry
Central registry that manages descriptors and caches ChatModel instances. It registers models from the catalog and from tier configuration during @PostConstruct initialization.
public class DynamicModelRegistry
Constructor Dependencies
| Dependency | Description |
|---|---|
ApplicationContext | Spring application context reference used during initialization. |
TieredLLMProperties | Configured tier hierarchy (layer 1 and layer 2, primary and backup models). |
LlmRuntimeCatalog | Catalog of Spring AI runtime bindings. Optional: when absent, getModel(...) raises ModelSelectionException. |
Public API
ChatModel for the given ID. Resolves the canonical model ID, looks up the binding, asks the catalog to resolve a live model, and caches the result. Throws ModelSelectionException when the ID or catalog is missing.ChatModel.AVAILABLE descriptors filtered by provider name (case insensitive).@PreDestroy hook that clears descriptors, instance cache, and aliases.ModelSelectionStrategy
Interface for model selection logic. The DynamicModelSelectionStrategy is the default implementation.
public interface ModelSelectionStrategy
DynamicModelSelectionStrategy
The default selection strategy uses an explicit-model-first fallback chain. It is constructed with a DynamicModelRegistry, the TieredLLMProperties, and the primary ChatModel bean.
public class DynamicModelSelectionStrategy implements ModelSelectionStrategy
- Explicit model request —
ExecutionContext.preferredModelis tried first. When absent, the strategy looks for metadata keys in this order:requestedModelId,preferredModel,runtimeModelId,modelId. - Tier resolution — When no explicit model resolves, the strategy picks a tier in this order:
analysisLevel.getDefaultTier(), thensecurityTaskType.getDefaultTier(), thenExecutionContext.tier. Values above 1 are normalized to configured layer 2. - Primary/backup lookup — The strategy reads
TieredLLMProperties.getModelNameForTier(...)for the primary model andgetBackupModelNameForTier(...)for the backup. - Primary ChatModel fallback — If every tier candidate fails, the auto-configured primary
ChatModelbean is returned as the last fallback.
Resolution Metadata
The strategy records the outcome on ExecutionContext.metadata. Downstream code and advisors can inspect these keys:
| Key | Description |
|---|---|
requestedModelId | The explicit model ID requested, when one was provided. |
requestedModelSourceKey | Where the requested ID came from (for example, executionContext.preferredModel or executionContext.metadata.runtimeModelId). |
selectedModelId | The model ID that was finally selected. |
selectedModelProvider | Provider name from the matching ModelDescriptor, when available. |
runtimeModelId | Duplicate of selectedModelId for downstream consumers that look for it under that key. |
modelSelectionSource | Why the model was chosen (for example, explicit_model, analysis_level:NORMAL, security_task_type:FORENSIC_ANALYSIS, tier:2, or primary_chat_model). |
modelSelectionFallbackUsed | true when a backup or the primary fallback was used. |
modelSelectionCandidates | Ordered list of model IDs the strategy tried. |
modelSelectionFailure | Reason for failure when no model could be selected. |
Additional Method
Configuration
Tiered Model Hierarchy
spring:
ai:
security:
layer1:
model: qwen2.5:7b
backup:
model: qwen2.5:7b
layer2:
model: gpt-4o-mini
backup:
model: llama3.1:8b
tiered:
layer1:
timeout-ms: 30000
layer2:
timeout-ms: 60000
contexa:
llm:
chat:
ollama:
base-url: http://localhost:11434
model: qwen2.5:7b
Selection order: explicit model request → analysisLevel → securityTaskType → explicit tier → primary ChatModel fallback. The runtime only has configured layers 1 and 2, so higher semantic tiers are normalized to layer 2.