LLM Orchestrator | Contexa Docs

Overview

The orchestrator implements both LLMOperations (context-based API) and ToolCapableLLMClient (prompt-based API), providing flexible access to LLM models across the platform.

When to Use Directly

In most cases, you do not need to call the orchestrator directly. The Pipeline's LLM_EXECUTION step calls it automatically. Direct use is appropriate when:

Scenario	Recommended Approach
Full AI feature with structured input/output	Use the Strategy/Lab/Pipeline architecture. The pipeline calls the orchestrator for you.
Simple chat or text generation	Use the orchestrator directly with `ExecutionContext`.
Tool-calling / agentic workflows	Use the orchestrator's `callTools()` or `callToolCallbacks()` methods directly.
Custom model selection or temperature	Use `ExecutionContext.builder()` with explicit tier, model, or temperature settings.

Tiered Model Architecture

The orchestrator uses tier-aware model selection. In the current OSS runtime, TieredLLMProperties exposes two configured layers (layer1 and layer2). Higher semantic tiers from AnalysisLevel or SecurityTaskType are normalized to the configured layer-2 runtime unless an explicit model is requested.

Tier	Characteristics	Typical Use
Tier 1	Fast, low temperature (e.g., 0.3)	Configured runtime layer 1 for quick detection, threat filtering, and latency-sensitive decisions
Tier 2	Balanced, moderate temperature (e.g., 0.7)	Configured runtime layer 2 for contextual analysis, behavior analysis, and correlation
Tier 3	Semantic "deep" tier used by enums such as `DEEP` and `EXPERT_INVESTIGATION`	Normalized to the configured layer-2 runtime unless an explicit model override is supplied

How Tier Selection Works

The effective model is resolved in this order:

preferredModel or execution metadata such as requestedModelId, preferredModel, runtimeModelId, or modelId — explicit model IDs bypass tier mapping entirely
analysisLevel — QUICK→1, NORMAL→2, DEEP→3 (normalized to configured layer 2 at runtime)
securityTaskType — security tasks map to default semantic tiers when analysisLevel is absent
tier — explicit numeric tier when no higher-priority selector is present; values above 1 are normalized to layer 2
The primary ChatModel bean as the final fallback if no configured model can be resolved

AnalysisLevel Enum

Value	Default Tier	Default Timeout
`QUICK`	1	50ms
`NORMAL`	2	300ms
`DEEP`	3	5000ms

ExecutionContext

The primary way to configure an LLM request. Built with @Builder pattern:

Java

ExecutionContext context = ExecutionContext.builder()
    .prompt(new Prompt("Analyze this for threats"))
    .analysisLevel(ExecutionContext.AnalysisLevel.NORMAL)
    .userId("user-123")
    .requestId(UUID.randomUUID().toString())
    .build();

// Execute
Mono<String> result = orchestrator.execute(context);

// Or stream
Flux<String> chunks = orchestrator.stream(context);

Key Properties

Property	Type	Description
`prompt`	`Prompt`	The Spring AI prompt to send to the LLM.
`requestId`	`String`	Unique request ID for tracing.
`userId`	`String`	Authenticated user ID for security context.
`sessionId`	`String`	Session identifier carried with the request.
`preferredModel`	`String`	Explicit model name. Bypasses all tier-based selection.
`securityTaskType`	`SecurityTaskType`	Security task classification. Used when `analysisLevel` is absent to resolve the default tier.
`tier`	`Integer`	Explicit semantic tier. The current runtime accepts 1 or 2 as configured layers and normalizes higher values to layer 2.
`analysisLevel`	`AnalysisLevel`	Analysis depth: QUICK, NORMAL, or DEEP. Determines the default tier.
`timeoutMs`	`Integer`	Explicit timeout override in milliseconds. When null, the tier/analysisLevel default is used.
`temperature`	`Double`	Sampling temperature override.
`topP`	`Double`	Top-p (nucleus sampling) override.
`seed`	`Integer`	Deterministic sampling seed. Useful for reproducible analysis results.
`maxTokens`	`Integer`	Maximum output tokens override.
`chatOptions`	`ChatOptions`	Spring AI ChatOptions to pass directly to the model. Overrides individual temperature/maxTokens settings.
`streamingMode`	`Boolean`	Whether to use streaming execution.
`toolCallbacks`	`List<ToolCallback>`	Tool callbacks for tool-calling execution.
`toolProviders`	`List<Object>`	Tool provider objects for tool-calling execution.
`toolExecutionEnabled`	`Boolean`	Master flag for tool execution. Set automatically by `addToolCallback(...)`.
`advisors`	`List<Advisor>`	Request-scoped advisors applied in addition to the global AdvisorRegistry.
`advisorEnabled`	`Boolean`	Whether global/request advisors are applied for this call. Defaults to `true` in the factory methods.
`metadata`	`Map<String, Object>`	Arbitrary metadata map. Consumers (such as `UnifiedLLMOrchestrator`) may read custom keys like `requestedModelId`, `runtimeModelId`, or `selectedModelId`; see the `UnifiedLLMOrchestrator` source for the keys that it actually reads.

AnalysisLevel Default Model Names

Each AnalysisLevel value also exposes a default model name via getDefaultModelName(). These defaults are used by the orchestrator when tier resolution falls back without a configured layer model:

Value	Default Model Name
`QUICK`	`tinyllama:latest`
`NORMAL`	`llama3.1:8b`
`DEEP`	`llama3.1:8b`

Factory & Helper Methods

Method	Return	Description
`ExecutionContext.from(Prompt)`	`ExecutionContext`	Create a minimal context from a Spring AI Prompt.
`ExecutionContext.forTier(int, Prompt)`	`ExecutionContext`	Create a context locked to a specific tier.
`addMetadata(String, Object)`	`ExecutionContext`	Fluent setter to add a single metadata entry.
`addAdvisor(Advisor)`	`ExecutionContext`	Fluent setter to add a request-scoped advisor.
`addToolCallback(ToolCallback)`	`ExecutionContext`	Fluent setter to add a tool callback.
`getEffectiveTier()`	`int`	Resolve the actual tier from explicit tier, analysisLevel, or securityTaskType.

SecurityTaskType Enum (full list)

Tasks that default to tier 3 still flow through the configured layer-2 model unless preferredModel or metadata explicitly selects another model.

Value	Default Tier	Description
`THREAT_FILTERING`	1	Fast threat filtering for real-time requests.
`QUICK_DETECTION`	1	Quick anomaly detection with minimal latency.
`CONTEXTUAL_ANALYSIS`	2	Context-aware security analysis.
`BEHAVIOR_ANALYSIS`	2	User behavior pattern analysis.
`CORRELATION`	2	Cross-event correlation analysis.
`EXPERT_INVESTIGATION`	3	Deep expert-level investigation.
`SOAR_AUTOMATION`	3	SOAR orchestration and automated incident response (Enterprise).
`INCIDENT_RESPONSE`	3	Automated incident response planning.
`FORENSIC_ANALYSIS`	3	Forensic analysis of security events.
`APPROVAL_WORKFLOW`	3	Human-in-the-loop approval workflows.

Tool Calling

The orchestrator supports Spring AI tool calling for agentic workflows. Two approaches:

Tool Providers (Annotated Methods)

Java

// Tool provider objects - Spring AI discovers @Tool methods
Prompt prompt = new Prompt("Block suspicious IP 192.168.1.100");

Mono<String> result = orchestrator.callTools(
    prompt, List.of(ipBlockTool, auditTool));

ToolCallback Array (Explicit)

Java

// Explicit ToolCallback array
ToolCallback[] callbacks = new ToolCallback[] {
    myToolCallback1, myToolCallback2
};

Mono<String> result = orchestrator.callToolCallbacks(
    prompt, callbacks);

Both approaches also support streaming: streamTools() and streamToolCallbacks().

Advisor System

The orchestrator pulls enabled advisors from the AdvisorRegistry and combines them with any request-scoped advisors attached to the ExecutionContext. Those advisors are applied when the orchestrator builds a chat call for the selected model.

Implementing a Custom Advisor

Extend BaseAdvisor to create advisors that intercept LLM requests and responses:

Java

@Component
public class RequestLoggingAdvisor extends BaseAdvisor {

    public RequestLoggingAdvisor() {
        super("myapp", "request-logging", 100);
        // domain, name, order (lower = runs first)
    }

    @Override
    public ChatClientRequest beforeCall(
            ChatClientRequest request) {
        // Inspect or modify the request before LLM call
        // e.g., log the prompt, add metadata
        return request;
    }

    @Override
    public ChatClientResponse afterCall(
            ChatClientResponse response,
            ChatClientRequest request) {
        // Inspect or modify the response after LLM call
        // e.g., validate response, log metrics
        return response;
    }
}

Advisors are auto-registered as Spring beans. The AdvisorRegistry manages enabled/disabled state and provides sorted advisor snapshots to the orchestrator.

Configuration

Property	Description	Default
`spring.ai.security.layer1.model`	Primary model selected for layer 1 requests	qwen2.5:7b
`spring.ai.security.layer2.model`	Primary model selected for layer 2 requests	gpt-4o-mini
`spring.ai.security.tiered.layer1.timeout-ms`	Execution timeout budget for layer 1 model calls	30000
`contexa.llm.chat.ollama.base-url`	Ollama chat runtime base URL when the dedicated Contexa Ollama path is enabled	empty
`contexa.llm.chat.ollama.model`	Default Ollama chat model for the Contexa-managed runtime	empty

Full Configuration Reference
See AI Configuration for all model, tier, advisor, and retry properties.

API Reference

LLMOperations Interface

public interface LLMOperations

execute(ExecutionContext context) Mono<String>

Executes an LLM call with full context including tier, model preferences, and tools.

stream(ExecutionContext context) Flux<String>

Streams the LLM response.

executeEntity(ExecutionContext context, Class<T> targetType) Mono<T>

Executes and deserializes into a typed entity.

LLMClient Interface

public interface LLMClient

call(Prompt prompt) Mono<String>

Sends a prompt and returns the text response.

entity(Prompt prompt, Class<T> targetType) Mono<T>

Sends a prompt and deserializes the response.

stream(Prompt prompt) Flux<String>

Streams the response as string chunks.

ToolCapableLLMClient Interface

public interface ToolCapableLLMClient extends LLMClient

callTools(Prompt prompt, List<Object> toolProviders) Mono<String>

Calls LLM with tool provider objects.

callToolCallbacks(Prompt prompt, ToolCallback[] toolCallbacks) Mono<String>

Calls LLM with explicit ToolCallback array.

callToolsResponse(Prompt prompt, List<Object> toolProviders) Mono<ChatResponse>

Returns full ChatResponse including tool call details.

callToolCallbacksResponse(Prompt prompt, ToolCallback[] toolCallbacks) Mono<ChatResponse>

Same as callToolsResponse but accepts an explicit ToolCallback array.

streamTools(Prompt prompt, List<Object> toolProviders) Flux<String>

Streaming with tool calling enabled.

streamToolCallbacks(Prompt prompt, ToolCallback[] toolCallbacks) Flux<String>

Streaming with explicit ToolCallback array.

UnifiedLLMOrchestrator

public class UnifiedLLMOrchestrator implements LLMOperations, ToolCapableLLMClient

Constructor Dependencies

Dependency	Description
`ModelSelectionStrategy`	Selects the `ChatModel` based on execution context.
`StreamingHandler`	Handles streaming response processing.
`TieredLLMProperties`	Configuration for the model tier hierarchy.
`AdvisorRegistry`	Registry of Spring AI Advisors applied to each request.

Key Behaviors

Advisor Integration — Automatically applies all enabled advisors from AdvisorRegistry.
Model Selection — Delegates to ModelSelectionStrategy.selectModel() considering tier, preferred model, analysis level, and security task type.
Retry Logic — Exponential backoff retry (up to 2 retries) for IOException.
Ollama Optimization — Detects OllamaChatModel and applies OllamaOptions with model name, temperature, topP, and numPredict.

LLM & Models

Overview

When to Use Directly

Tiered Model Architecture

How Tier Selection Works

AnalysisLevel Enum

ExecutionContext

Key Properties

AnalysisLevel Default Model Names

Factory & Helper Methods

Tool Calling

Tool Providers (Annotated Methods)

ToolCallback Array (Explicit)

Advisor System

Implementing a Custom Advisor

Configuration

API Reference

Constructor Dependencies

Key Behaviors

Related

Building Custom AI

Model Providers

Advisor System

AI Configuration