contexa-core

LLM & Models

The UnifiedLLMOrchestrator is the central LLM client for the Contexa platform. It provides a single entry point for synchronous calls, streaming, structured entity extraction, and tool-calling — with tiered model selection, advisor integration, and automatic retry logic.

Overview

The orchestrator implements both LLMOperations (context-based API) and ToolCapableLLMClient (prompt-based API), providing flexible access to LLM models across the platform.

When to Use Directly

In most cases, you do not need to call the orchestrator directly. The Pipeline's LLM_EXECUTION step calls it automatically. Direct use is appropriate when:

ScenarioRecommended Approach
Full AI feature with structured input/output Use the Strategy/Lab/Pipeline architecture. The pipeline calls the orchestrator for you.
Simple chat or text generation Use the orchestrator directly with ExecutionContext.
Tool-calling / agentic workflows Use the orchestrator's callTools() or callToolCallbacks() methods directly.
Custom model selection or temperature Use ExecutionContext.builder() with explicit tier, model, or temperature settings.

Tiered Model Architecture

The orchestrator uses tier-aware model selection. In the current OSS runtime, TieredLLMProperties exposes two configured layers (layer1 and layer2). Higher semantic tiers from AnalysisLevel or SecurityTaskType are normalized to the configured layer-2 runtime unless an explicit model is requested.

TierCharacteristicsTypical Use
Tier 1 Fast, low temperature (e.g., 0.3) Configured runtime layer 1 for quick detection, threat filtering, and latency-sensitive decisions
Tier 2 Balanced, moderate temperature (e.g., 0.7) Configured runtime layer 2 for contextual analysis, behavior analysis, and correlation
Tier 3 Semantic "deep" tier used by enums such as DEEP and EXPERT_INVESTIGATION Normalized to the configured layer-2 runtime unless an explicit model override is supplied

How Tier Selection Works

The effective model is resolved in this order:

  1. preferredModel or execution metadata such as requestedModelId, preferredModel, runtimeModelId, or modelId — explicit model IDs bypass tier mapping entirely
  2. analysisLevel — QUICK→1, NORMAL→2, DEEP→3 (normalized to configured layer 2 at runtime)
  3. securityTaskType — security tasks map to default semantic tiers when analysisLevel is absent
  4. tier — explicit numeric tier when no higher-priority selector is present; values above 1 are normalized to layer 2
  5. The primary ChatModel bean as the final fallback if no configured model can be resolved

AnalysisLevel Enum

ValueDefault TierDefault Timeout
QUICK150ms
NORMAL2300ms
DEEP35000ms

ExecutionContext

The primary way to configure an LLM request. Built with @Builder pattern:

Java
ExecutionContext context = ExecutionContext.builder()
    .prompt(new Prompt("Analyze this for threats"))
    .analysisLevel(ExecutionContext.AnalysisLevel.NORMAL)
    .userId("user-123")
    .requestId(UUID.randomUUID().toString())
    .build();

// Execute
Mono<String> result = orchestrator.execute(context);

// Or stream
Flux<String> chunks = orchestrator.stream(context);

Key Properties

PropertyTypeDescription
promptPromptThe Spring AI prompt to send to the LLM.
analysisLevelAnalysisLevelAnalysis depth: QUICK, NORMAL, or DEEP. Determines the default tier.
tierIntegerExplicit semantic tier. The current runtime accepts 1 or 2 as configured layers and normalizes higher values to layer 2.
preferredModelStringExplicit model name. Bypasses all tier-based selection.
userIdStringAuthenticated user ID for security context.
requestIdStringUnique request ID for tracing.
temperatureDoubleSampling temperature override.
maxTokensIntegerMaximum output tokens override.
streamingModeBooleanWhether to use streaming execution.
toolCallbacksList<ToolCallback>Tool callbacks for tool-calling execution.
toolProvidersList<Object>Tool provider objects for tool-calling execution.
seedIntegerDeterministic sampling seed. Useful for reproducible analysis results.
chatOptionsChatOptionsSpring AI ChatOptions to pass directly to the model. Overrides individual temperature/maxTokens settings.
advisorsList<Advisor>Request-scoped advisors applied in addition to the global AdvisorRegistry.
metadataMap<String, Object>Arbitrary metadata. Recognized keys: disableRetries (skip retry logic), disableOllamaThinking (suppress Ollama ThinkOption).

Factory & Helper Methods

MethodReturnDescription
ExecutionContext.from(Prompt)ExecutionContextCreate a minimal context from a Spring AI Prompt.
ExecutionContext.forTier(int, Prompt)ExecutionContextCreate a context locked to a specific tier.
addMetadata(String, Object)ExecutionContextFluent setter to add a single metadata entry.
addAdvisor(Advisor)ExecutionContextFluent setter to add a request-scoped advisor.
addToolCallback(ToolCallback)ExecutionContextFluent setter to add a tool callback.
getEffectiveTier()intResolve the actual tier from explicit tier, analysisLevel, or securityTaskType.
SecurityTaskType Enum (full list)

Tasks that default to tier 3 still flow through the configured layer-2 model unless preferredModel or metadata explicitly selects another model.

ValueDefault TierDescription
THREAT_FILTERING1Fast threat filtering for real-time requests.
QUICK_DETECTION1Quick anomaly detection with minimal latency.
CONTEXTUAL_ANALYSIS2Context-aware security analysis.
BEHAVIOR_ANALYSIS2User behavior pattern analysis.
CORRELATION2Cross-event correlation analysis.
EXPERT_INVESTIGATION3Deep expert-level investigation.
SOAR_AUTOMATION3SOAR orchestration and automated incident response (Enterprise).
INCIDENT_RESPONSE3Automated incident response planning.
FORENSIC_ANALYSIS3Forensic analysis of security events.
APPROVAL_WORKFLOW3Human-in-the-loop approval workflows.

Tool Calling

The orchestrator supports Spring AI tool calling for agentic workflows. Two approaches:

Tool Providers (Annotated Methods)

Java
// Tool provider objects - Spring AI discovers @Tool methods
Prompt prompt = new Prompt("Block suspicious IP 192.168.1.100");

Mono<String> result = orchestrator.callTools(
    prompt, List.of(ipBlockTool, auditTool));

ToolCallback Array (Explicit)

Java
// Explicit ToolCallback array
ToolCallback[] callbacks = new ToolCallback[] {
    myToolCallback1, myToolCallback2
};

Mono<String> result = orchestrator.callToolCallbacks(
    prompt, callbacks);

Both approaches also support streaming: streamTools() and streamToolCallbacks().

Advisor System

The orchestrator pulls enabled advisors from the AdvisorRegistry and combines them with any request-scoped advisors attached to the ExecutionContext. Those advisors are applied when the orchestrator builds a chat call for the selected model.

Implementing a Custom Advisor

Extend BaseAdvisor to create advisors that intercept LLM requests and responses:

Java
@Component
public class RequestLoggingAdvisor extends BaseAdvisor {

    public RequestLoggingAdvisor() {
        super("myapp", "request-logging", 100);
        // domain, name, order (lower = runs first)
    }

    @Override
    public ChatClientRequest beforeCall(
            ChatClientRequest request) {
        // Inspect or modify the request before LLM call
        // e.g., log the prompt, add metadata
        return request;
    }

    @Override
    public ChatClientResponse afterCall(
            ChatClientResponse response,
            ChatClientRequest request) {
        // Inspect or modify the response after LLM call
        // e.g., validate response, log metrics
        return response;
    }
}

Advisors are auto-registered as Spring beans. The AdvisorRegistry manages enabled/disabled state and provides sorted advisor snapshots to the orchestrator.

Configuration

PropertyDescriptionDefault
spring.ai.security.layer1.modelPrimary model selected for layer 1 requestsqwen2.5:14b
spring.ai.security.layer2.modelPrimary model selected for layer 2 requestsexaone3.5:latest
spring.ai.security.tiered.layer1.timeout-msExecution timeout budget for layer 1 model calls30000
contexa.llm.chat.ollama.base-urlOllama chat runtime base URL when the dedicated Contexa Ollama path is enabledempty
contexa.llm.chat.ollama.modelDefault Ollama chat model for the Contexa-managed runtimeempty
Full Configuration Reference
See AI Configuration for all model, tier, advisor, and retry properties.

API Reference

LLMOperations Interface
public interface LLMOperations
execute(ExecutionContext context) Mono<String>
Executes an LLM call with full context including tier, model preferences, and tools.
stream(ExecutionContext context) Flux<String>
Streams the LLM response.
executeEntity(ExecutionContext context, Class<T> targetType) Mono<T>
Executes and deserializes into a typed entity.
LLMClient Interface
public interface LLMClient
call(Prompt prompt) Mono<String>
Sends a prompt and returns the text response.
entity(Prompt prompt, Class<T> targetType) Mono<T>
Sends a prompt and deserializes the response.
stream(Prompt prompt) Flux<String>
Streams the response as string chunks.
ToolCapableLLMClient Interface
public interface ToolCapableLLMClient extends LLMClient
callTools(Prompt prompt, List<Object> toolProviders) Mono<String>
Calls LLM with tool provider objects.
callToolCallbacks(Prompt prompt, ToolCallback[] toolCallbacks) Mono<String>
Calls LLM with explicit ToolCallback array.
callToolsResponse(Prompt prompt, List<Object> toolProviders) Mono<ChatResponse>
Returns full ChatResponse including tool call details.
streamTools(Prompt prompt, List<Object> toolProviders) Flux<String>
Streaming with tool calling enabled.
streamToolCallbacks(Prompt prompt, ToolCallback[] toolCallbacks) Flux<String>
Streaming with explicit ToolCallback array.
UnifiedLLMOrchestrator
public class UnifiedLLMOrchestrator implements LLMOperations, ToolCapableLLMClient

Constructor Dependencies

DependencyDescription
ModelSelectionStrategySelects the ChatModel based on execution context.
StreamingHandlerHandles streaming response processing.
TieredLLMPropertiesConfiguration for the model tier hierarchy.
AdvisorRegistryRegistry of Spring AI Advisors applied to each request.

Key Behaviors

  • Advisor Integration — Automatically applies all enabled advisors from AdvisorRegistry.
  • Model Selection — Delegates to ModelSelectionStrategy.selectModel() considering tier, preferred model, analysis level, and security task type.
  • Retry Logic — Exponential backoff retry (up to 2 retries) for IOException.
  • Ollama Optimization — Detects OllamaChatModel and applies OllamaOptions with model name, temperature, topP, and numPredict.