contexa-core

Model Providers

The model provider system manages LLM models across multiple runtimes. The DynamicModelRegistry registers models from the LlmRuntimeCatalog chat bindings and the configured TieredLLMProperties layers, while the DynamicModelSelectionStrategy resolves the runtime model for each request from explicit model hints, tier mapping, and a primary ChatModel fallback.

Overview

Contexa supports multiple LLM runtimes simultaneously. At startup, the DynamicModelRegistry populates its descriptor map through two mechanisms:

  1. LlmRuntimeCatalog chat bindings — every LlmRuntimeBinding returned from LlmRuntimeCatalog.getChatBindings() is registered. The catalog is built from Spring AI ChatModel beans created by the Contexa runtime (for example via contexa.llm.chat.ollama.*) and by Spring AI autoconfigurations (Anthropic, OpenAI, and others).
  2. TieredLLMProperties layers — the primary and optional backup models configured for spring.ai.security.layer1 and spring.ai.security.layer2 are registered with their tier. The registry uses resolveCanonicalModelId(...) so the same model can be found by its runtime ID, bean name, or any alias.

The registry infers the provider name from the model ID when no binding is available. Recognised substrings include llama, qwen, gemma, mistral, phi, exaone, codellama, and deepseek (→ ollama), claude (→ anthropic), gpt/o1/davinci (→ openai), gemini/vertex (→ gemini), and bedrock (→ bedrock). Anything else is labelled spring.

LlmRuntimeCatalog

The interface that exposes discovered chat and embedding runtimes. The registry uses it to find bindings and resolve live ChatModel instances on demand.

public interface LlmRuntimeCatalog
getChatBindings() List<LlmRuntimeBinding>
Returns all chat runtime bindings known to the catalog.
getEmbeddingBindings() List<LlmRuntimeBinding>
Returns all embedding runtime bindings known to the catalog.
findChatBinding(String selector) Optional<LlmRuntimeBinding>
Locates a chat binding by runtime ID, bean name, model ID, or any registered alias.
findEmbeddingBinding(String selector) Optional<LlmRuntimeBinding>
Locates an embedding binding by any of the same selectors.
resolveChatModel(String selector) ChatModel
Returns the live Spring AI ChatModel for the matching binding.
resolveEmbeddingModel(String selector) EmbeddingModel
Returns the live Spring AI EmbeddingModel for the matching binding.
resolvePrimaryChatModel(String priorityConfig) Optional<ChatModel>
Resolves the primary chat model using a comma-separated priority configuration string.
resolvePrimaryEmbeddingModel(String priorityConfig) Optional<EmbeddingModel>
Resolves the primary embedding model using a priority configuration string.
resolveSpringPrimaryChatModel() Optional<ChatModel>
Returns the Spring AI primary ChatModel bean when one is present in the context.
resolveSpringPrimaryEmbeddingModel() Optional<EmbeddingModel>
Returns the Spring AI primary EmbeddingModel bean when one is present.

LlmRuntimeBinding

An immutable record of a single runtime binding. Bindings are created by Contexa's LLM autoconfiguration and consumed by both the catalog and the registry.

public final class LlmRuntimeBinding
PropertyTypeDescription
runtimeIdStringContexa runtime ID assigned to this binding.
beanNameStringSpring bean name of the underlying model bean.
providerStringProvider identifier (for example, ollama, anthropic, openai).
modelIdStringCanonical model ID reported by the runtime (for example, llama3.1:8b).
aliasesSet<String>Additional selectors that resolve to the same binding.
typeLlmRuntimeTypeChat or embedding runtime type.
primarybooleanWhether this binding is marked as the Spring primary for its type.
sourceStringOrigin of the binding (for example, autoconfiguration class or configuration key).

Helper Methods

MethodReturnDescription
canonicalId()StringReturns the preferred identifier for the binding: modelId, then runtimeId, then beanName.
matches(String selector)booleanReturns whether the selector matches runtimeId, beanName, modelId, or any alias.

ModelDescriptor

Describes a model's identity, capabilities, default options, and current status.

@Data @Builder
public class ModelDescriptor
PropertyTypeDescription
modelIdStringUnique model identifier (for example, llama3.1:8b, claude-3-opus).
displayNameStringHuman-readable model name.
providerStringProvider name (ollama, anthropic, openai).
tierIntegerThe configured runtime tier assigned to the model (1 or 2). Null if unassigned.
versionStringModel version string.
capabilitiesModelCapabilitiesNested class describing streaming, tool calling, multimodal, context window, and output budget.
optionsModelOptionsNested class describing default sampling options (temperature, topP, topK, repetitionPenalty).
statusModelStatusAVAILABLE or UNAVAILABLE.

ModelCapabilities (nested class)

FieldTypeDefault
streamingbooleantrue
toolCallingbooleanfalse
functionCallingbooleanfalse
multiModalbooleanfalse
maxTokensint4096
contextWindowint4096
maxOutputTokensint4096

ModelOptions (nested class)

FieldTypeDefault
temperatureDouble0.7
topPDouble0.9
topKIntegernull
repetitionPenaltyDouble1.0

Helper Method

supportsAdvancedFeatures() boolean
Returns true when the capabilities declare tool calling, function calling, or multimodal support.

DynamicModelRegistry

Central registry that manages descriptors and caches ChatModel instances. It registers models from the catalog and from tier configuration during @PostConstruct initialization.

public class DynamicModelRegistry

Constructor Dependencies

DependencyDescription
ApplicationContextSpring application context reference used during initialization.
TieredLLMPropertiesConfigured tier hierarchy (layer 1 and layer 2, primary and backup models).
LlmRuntimeCatalogCatalog of Spring AI runtime bindings. Optional: when absent, getModel(...) raises ModelSelectionException.

Public API

getModel(String modelId) ChatModel
Returns the ChatModel for the given ID. Resolves the canonical model ID, looks up the binding, asks the catalog to resolve a live model, and caches the result. Throws ModelSelectionException when the ID or catalog is missing.
getAllModels() Collection<ModelDescriptor>
Returns a copy of every registered model descriptor.
getDescriptor(String modelId) ModelDescriptor
Returns the registered descriptor for a model ID without instantiating a new ChatModel.
getModelsByProvider(String provider) List<ModelDescriptor>
Returns AVAILABLE descriptors filtered by provider name (case insensitive).
registerModel(ModelDescriptor descriptor) void
Registers a descriptor or merges it into an existing one. Existing tier and provider values are preserved when the incoming descriptor leaves them unset.
refreshModels() void
Clears descriptors, instance cache, and aliases, then re-registers catalog bindings and configuration layers.
updateModelStatus(String modelId, ModelStatus status) void
Updates the availability status of a registered model.
shutdown() void
@PreDestroy hook that clears descriptors, instance cache, and aliases.

ModelSelectionStrategy

Interface for model selection logic. The DynamicModelSelectionStrategy is the default implementation.

public interface ModelSelectionStrategy
selectModel(ExecutionContext context) ChatModel
Selects the best model for the given execution context. Returns null if no model is available.
getSupportedModels() Set<String>
Returns the set of all model IDs available for selection.
isModelAvailable(String modelName) boolean
Checks whether a specific model is currently available.

DynamicModelSelectionStrategy

The default selection strategy uses an explicit-model-first fallback chain. It is constructed with a DynamicModelRegistry, the TieredLLMProperties, and the primary ChatModel bean.

public class DynamicModelSelectionStrategy implements ModelSelectionStrategy
  1. Explicit model requestExecutionContext.preferredModel is tried first. When absent, the strategy looks for metadata keys in this order: requestedModelId, preferredModel, runtimeModelId, modelId.
  2. Tier resolution — When no explicit model resolves, the strategy picks a tier in this order: analysisLevel.getDefaultTier(), then securityTaskType.getDefaultTier(), then ExecutionContext.tier. Values above 1 are normalized to configured layer 2.
  3. Primary/backup lookup — The strategy reads TieredLLMProperties.getModelNameForTier(...) for the primary model and getBackupModelNameForTier(...) for the backup.
  4. Primary ChatModel fallback — If every tier candidate fails, the auto-configured primary ChatModel bean is returned as the last fallback.

Resolution Metadata

The strategy records the outcome on ExecutionContext.metadata. Downstream code and advisors can inspect these keys:

KeyDescription
requestedModelIdThe explicit model ID requested, when one was provided.
requestedModelSourceKeyWhere the requested ID came from (for example, executionContext.preferredModel or executionContext.metadata.runtimeModelId).
selectedModelIdThe model ID that was finally selected.
selectedModelProviderProvider name from the matching ModelDescriptor, when available.
runtimeModelIdDuplicate of selectedModelId for downstream consumers that look for it under that key.
modelSelectionSourceWhy the model was chosen (for example, explicit_model, analysis_level:NORMAL, security_task_type:FORENSIC_ANALYSIS, tier:2, or primary_chat_model).
modelSelectionFallbackUsedtrue when a backup or the primary fallback was used.
modelSelectionCandidatesOrdered list of model IDs the strategy tried.
modelSelectionFailureReason for failure when no model could be selected.

Additional Method

getModelCapabilities(String modelName) Map<String, Object>
Returns a map that summarises the descriptor's model ID, provider, capabilities block, context window, maxTokens, streaming flag, tier, and status. Returns an empty map when the model is not registered.

Configuration

Tiered Model Hierarchy

YAML
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
        backup:
          model: qwen2.5:7b
      layer2:
        model: gpt-4o-mini
        backup:
          model: llama3.1:8b
      tiered:
        layer1:
          timeout-ms: 30000
        layer2:
          timeout-ms: 60000

contexa:
  llm:
    chat:
      ollama:
        base-url: http://localhost:11434
        model: qwen2.5:7b

Selection order: explicit model request → analysisLevelsecurityTaskType → explicit tier → primary ChatModel fallback. The runtime only has configured layers 1 and 2, so higher semantic tiers are normalized to layer 2.