LLM & Models
The UnifiedLLMOrchestrator is the central LLM client for the Contexa platform. It provides a single entry point for synchronous calls, streaming, structured entity extraction, and tool-calling — with tiered model selection, advisor integration, and automatic retry logic.
Overview
The orchestrator implements both LLMOperations (context-based API) and ToolCapableLLMClient (prompt-based API), providing flexible access to LLM models across the platform.
When to Use Directly
In most cases, you do not need to call the orchestrator directly. The Pipeline's LLM_EXECUTION step calls it automatically. Direct use is appropriate when:
| Scenario | Recommended Approach |
|---|---|
| Full AI feature with structured input/output | Use the Strategy/Lab/Pipeline architecture. The pipeline calls the orchestrator for you. |
| Simple chat or text generation | Use the orchestrator directly with ExecutionContext. |
| Tool-calling / agentic workflows | Use the orchestrator's callTools() or callToolCallbacks() methods directly. |
| Custom model selection or temperature | Use ExecutionContext.builder() with explicit tier, model, or temperature settings. |
Tiered Model Architecture
The orchestrator uses tier-aware model selection. In the current OSS runtime, TieredLLMProperties exposes two configured layers (layer1 and layer2). Higher semantic tiers from AnalysisLevel or SecurityTaskType are normalized to the configured layer-2 runtime unless an explicit model is requested.
| Tier | Characteristics | Typical Use |
|---|---|---|
| Tier 1 | Fast, low temperature (e.g., 0.3) | Configured runtime layer 1 for quick detection, threat filtering, and latency-sensitive decisions |
| Tier 2 | Balanced, moderate temperature (e.g., 0.7) | Configured runtime layer 2 for contextual analysis, behavior analysis, and correlation |
| Tier 3 | Semantic "deep" tier used by enums such as DEEP and EXPERT_INVESTIGATION |
Normalized to the configured layer-2 runtime unless an explicit model override is supplied |
How Tier Selection Works
The effective model is resolved in this order:
preferredModelor execution metadata such asrequestedModelId,preferredModel,runtimeModelId, ormodelId— explicit model IDs bypass tier mapping entirelyanalysisLevel— QUICK→1, NORMAL→2, DEEP→3 (normalized to configured layer 2 at runtime)securityTaskType— security tasks map to default semantic tiers whenanalysisLevelis absenttier— explicit numeric tier when no higher-priority selector is present; values above 1 are normalized to layer 2- The primary
ChatModelbean as the final fallback if no configured model can be resolved
AnalysisLevel Enum
| Value | Default Tier | Default Timeout |
|---|---|---|
QUICK | 1 | 50ms |
NORMAL | 2 | 300ms |
DEEP | 3 | 5000ms |
ExecutionContext
The primary way to configure an LLM request. Built with @Builder pattern:
ExecutionContext context = ExecutionContext.builder()
.prompt(new Prompt("Analyze this for threats"))
.analysisLevel(ExecutionContext.AnalysisLevel.NORMAL)
.userId("user-123")
.requestId(UUID.randomUUID().toString())
.build();
// Execute
Mono<String> result = orchestrator.execute(context);
// Or stream
Flux<String> chunks = orchestrator.stream(context);
Key Properties
| Property | Type | Description |
|---|---|---|
prompt | Prompt | The Spring AI prompt to send to the LLM. |
analysisLevel | AnalysisLevel | Analysis depth: QUICK, NORMAL, or DEEP. Determines the default tier. |
tier | Integer | Explicit semantic tier. The current runtime accepts 1 or 2 as configured layers and normalizes higher values to layer 2. |
preferredModel | String | Explicit model name. Bypasses all tier-based selection. |
userId | String | Authenticated user ID for security context. |
requestId | String | Unique request ID for tracing. |
temperature | Double | Sampling temperature override. |
maxTokens | Integer | Maximum output tokens override. |
streamingMode | Boolean | Whether to use streaming execution. |
toolCallbacks | List<ToolCallback> | Tool callbacks for tool-calling execution. |
toolProviders | List<Object> | Tool provider objects for tool-calling execution. |
seed | Integer | Deterministic sampling seed. Useful for reproducible analysis results. |
chatOptions | ChatOptions | Spring AI ChatOptions to pass directly to the model. Overrides individual temperature/maxTokens settings. |
advisors | List<Advisor> | Request-scoped advisors applied in addition to the global AdvisorRegistry. |
metadata | Map<String, Object> | Arbitrary metadata. Recognized keys: disableRetries (skip retry logic), disableOllamaThinking (suppress Ollama ThinkOption). |
Factory & Helper Methods
| Method | Return | Description |
|---|---|---|
ExecutionContext.from(Prompt) | ExecutionContext | Create a minimal context from a Spring AI Prompt. |
ExecutionContext.forTier(int, Prompt) | ExecutionContext | Create a context locked to a specific tier. |
addMetadata(String, Object) | ExecutionContext | Fluent setter to add a single metadata entry. |
addAdvisor(Advisor) | ExecutionContext | Fluent setter to add a request-scoped advisor. |
addToolCallback(ToolCallback) | ExecutionContext | Fluent setter to add a tool callback. |
getEffectiveTier() | int | Resolve the actual tier from explicit tier, analysisLevel, or securityTaskType. |
SecurityTaskType Enum (full list)
Tasks that default to tier 3 still flow through the configured layer-2 model unless preferredModel or metadata explicitly selects another model.
| Value | Default Tier | Description |
|---|---|---|
THREAT_FILTERING | 1 | Fast threat filtering for real-time requests. |
QUICK_DETECTION | 1 | Quick anomaly detection with minimal latency. |
CONTEXTUAL_ANALYSIS | 2 | Context-aware security analysis. |
BEHAVIOR_ANALYSIS | 2 | User behavior pattern analysis. |
CORRELATION | 2 | Cross-event correlation analysis. |
EXPERT_INVESTIGATION | 3 | Deep expert-level investigation. |
SOAR_AUTOMATION | 3 | SOAR orchestration and automated incident response (Enterprise). |
INCIDENT_RESPONSE | 3 | Automated incident response planning. |
FORENSIC_ANALYSIS | 3 | Forensic analysis of security events. |
APPROVAL_WORKFLOW | 3 | Human-in-the-loop approval workflows. |
Tool Calling
The orchestrator supports Spring AI tool calling for agentic workflows. Two approaches:
Tool Providers (Annotated Methods)
// Tool provider objects - Spring AI discovers @Tool methods
Prompt prompt = new Prompt("Block suspicious IP 192.168.1.100");
Mono<String> result = orchestrator.callTools(
prompt, List.of(ipBlockTool, auditTool));
ToolCallback Array (Explicit)
// Explicit ToolCallback array
ToolCallback[] callbacks = new ToolCallback[] {
myToolCallback1, myToolCallback2
};
Mono<String> result = orchestrator.callToolCallbacks(
prompt, callbacks);
Both approaches also support streaming: streamTools() and streamToolCallbacks().
Advisor System
The orchestrator pulls enabled advisors from the AdvisorRegistry and combines them with any request-scoped advisors attached to the ExecutionContext. Those advisors are applied when the orchestrator builds a chat call for the selected model.
Implementing a Custom Advisor
Extend BaseAdvisor to create advisors that intercept LLM requests and responses:
@Component
public class RequestLoggingAdvisor extends BaseAdvisor {
public RequestLoggingAdvisor() {
super("myapp", "request-logging", 100);
// domain, name, order (lower = runs first)
}
@Override
public ChatClientRequest beforeCall(
ChatClientRequest request) {
// Inspect or modify the request before LLM call
// e.g., log the prompt, add metadata
return request;
}
@Override
public ChatClientResponse afterCall(
ChatClientResponse response,
ChatClientRequest request) {
// Inspect or modify the response after LLM call
// e.g., validate response, log metrics
return response;
}
}
Advisors are auto-registered as Spring beans. The AdvisorRegistry manages enabled/disabled state and provides sorted advisor snapshots to the orchestrator.
Configuration
| Property | Description | Default |
|---|---|---|
spring.ai.security.layer1.model | Primary model selected for layer 1 requests | qwen2.5:14b |
spring.ai.security.layer2.model | Primary model selected for layer 2 requests | exaone3.5:latest |
spring.ai.security.tiered.layer1.timeout-ms | Execution timeout budget for layer 1 model calls | 30000 |
contexa.llm.chat.ollama.base-url | Ollama chat runtime base URL when the dedicated Contexa Ollama path is enabled | empty |
contexa.llm.chat.ollama.model | Default Ollama chat model for the Contexa-managed runtime | empty |
See AI Configuration for all model, tier, advisor, and retry properties.
API Reference
LLMOperations Interface
public interface LLMOperations
LLMClient Interface
public interface LLMClient
ToolCapableLLMClient Interface
public interface ToolCapableLLMClient extends LLMClient
ChatResponse including tool call details.UnifiedLLMOrchestrator
public class UnifiedLLMOrchestrator implements LLMOperations, ToolCapableLLMClient
Constructor Dependencies
| Dependency | Description |
|---|---|
ModelSelectionStrategy | Selects the ChatModel based on execution context. |
StreamingHandler | Handles streaming response processing. |
TieredLLMProperties | Configuration for the model tier hierarchy. |
AdvisorRegistry | Registry of Spring AI Advisors applied to each request. |
Key Behaviors
- Advisor Integration — Automatically applies all enabled advisors from
AdvisorRegistry. - Model Selection — Delegates to
ModelSelectionStrategy.selectModel()considering tier, preferred model, analysis level, and security task type. - Retry Logic — Exponential backoff retry (up to 2 retries) for
IOException. - Ollama Optimization — Detects
OllamaChatModeland appliesOllamaOptionswith model name, temperature, topP, and numPredict.