AI Engine Configuration
Configuration properties for the Contexa AI engine, including the tiered LLM strategy, RAG, advisor chain, streaming pipeline, and tuning scenarios.
Quick Start: Minimal Config
The minimum configuration to run the OSS AI path with the built-in Ollama chat runtime and the tiered security models:
contexa:
llm:
chat:
ollama:
base-url: http://localhost:11434
model: qwen2.5:7b
spring:
ai:
security:
layer1:
model: qwen2.5:7b
layer2:
model: gpt-4o-mini
vectorstore:
pgvector:
index-type: HNSW
distance-type: COSINE_DISTANCE
dimensions: 1024
This enables the Ollama chat runtime through contexa.llm.chat.ollama.*, assigns Tier 1 and Tier 2 models, and configures the project-declared pgvector properties. If you want a dedicated embedding runtime, add contexa.llm.embedding.ollama.* as shown below.
Configuration Architecture
The AI runtime is configured through seven property groups. Contexa owns the runtime selection, advisor, RAG, and streaming properties, while Spring AI owns the provider integrations and pgvector base configuration.
| Property Prefix | Bound To | Configures |
|---|---|---|
contexa.llm.* | ContexaProperties | Chat runtime selection, Ollama endpoints, model priority, dedicated embedding runtime |
spring.ai.security.* | TieredLLMProperties | Tier 1 and Tier 2 model selection, backup models, helper timeout accessors |
spring.ai.security.tiered.* | TieredStrategyProperties | Prompt budgets, truncation, vector cache, trusted proxy validation, RAG thresholds |
contexa.advisor.* | ContexaAdvisorProperties | Advisor chain profile, security advisor order, SOAR approval advisor settings |
contexa.rag.* | ContexaRagProperties | Default, behavior, risk, AI Lab, and ETL retrieval settings |
contexa.streaming.* | StreamingProperties | Protocol markers, retry policy, parser buffers, streaming timeout |
spring.ai.vectorstore.pgvector.* | PgVectorStoreProperties | Index type, dimensions, search/store limits, document chunking, HNSW/IVFFLAT tuning |
LLM Model Configuration
The tiered model system is split between runtime selection in contexa.llm.* and tier assignment in spring.ai.security.*. The OSS core ships with Ollama chat runtime wiring and can also use Spring AI OpenAI or Anthropic providers if those beans are present.
| Property | Default | Description |
|---|---|---|
contexa.llm.enabled | true | Master flag for Contexa LLM-dependent features |
contexa.llm.advisor-enabled | true | Enables the advisor chain registration path |
contexa.llm.selection.chat.mode | DYNAMIC_PRIORITY | Chat provider selection strategy: DYNAMIC_PRIORITY walks the priority list, SPRING_PRIMARY uses the Spring @Primary bean |
contexa.llm.selection.chat.priority | "" | Comma-separated chat provider order (e.g., ollama,anthropic,openai) |
contexa.llm.selection.embedding.mode | DYNAMIC_PRIORITY | Embedding provider selection strategy |
contexa.llm.selection.embedding.priority | "" | Comma-separated embedding provider order (e.g., ollama,openai) |
contexa.llm.chat.ollama.base-url | "" | Required to enable the built-in Ollama chat runtime |
contexa.llm.chat.ollama.model | "" | Optional explicit Ollama chat model; if omitted the auto-configuration falls back to qwen2.5:7b |
contexa.llm.chat.ollama.keep-alive | "" | Optional Ollama keep-alive hint passed to chat options |
contexa.llm.embedding.ollama.dedicated-runtime-enabled | false | Enables a dedicated Ollama embedding runtime instead of reusing the chat runtime |
contexa.llm.embedding.ollama.base-url | "" | Required when the dedicated embedding runtime is enabled |
contexa.llm.embedding.ollama.model | "" | Optional explicit embedding model; if omitted the auto-configuration falls back to mxbai-embed-large |
spring.ai.security.layer1.model | qwen2.5:7b | Tier 1 model selection |
spring.ai.security.layer1.backup.model | — | Tier 1 backup model |
spring.ai.security.layer2.model | gpt-4o-mini | Tier 2 model selection |
spring.ai.security.layer2.backup.model | — | Tier 2 backup model |
spring.ai.security.tiered.layer1.timeout-ms | 30000 | Tier 1 inference timeout used by TieredLLMProperties when the value is unset or invalid |
spring.ai.security.tiered.layer2.timeout-ms | 60000 | Tier 2 inference timeout used by TieredLLMProperties when the value is unset or invalid |
contexa:
llm:
selection:
chat:
mode: DYNAMIC_PRIORITY
priority: ollama,anthropic,openai
embedding:
mode: DYNAMIC_PRIORITY
priority: ollama,openai
chat:
ollama:
base-url: http://localhost:11434
model: qwen2.5:7b
embedding:
ollama:
dedicated-runtime-enabled: true
base-url: http://localhost:11435
model: mxbai-embed-large
spring:
ai:
security:
layer1:
model: qwen2.5:7b
backup:
model: llama3.2:latest
layer2:
model: gpt-4o-mini
backup:
model: deepseek-r1:14b
tiered:
layer1:
timeout-ms: 30000
layer2:
timeout-ms: 60000
Recommended selection (OpenAI + Anthropic with Ollama failover)
The platform can chain managed cloud providers with the local Ollama runtime so that chat calls degrade gracefully when an API key is missing or a cloud endpoint is unreachable. The example below uses OpenAI as the primary chat provider, Anthropic as the second-tier failover, and the local Ollama runtime as the offline / no-key fallback. Embedding always goes through OpenAI.
contexa:
llm:
selection:
chat:
mode: dynamic-priority
priority: openai,anthropic,ollama
embedding:
mode: dynamic-priority
priority: openai
Key meaning
mode: dynamic-priority— the orchestrator walks the comma-separatedprioritylist in order and selects the first provider whoseChatModel/EmbeddingModelbean is present and reachable. The runtime formdynamic-priorityis the kebab-case spelling of theMode.DYNAMIC_PRIORITYenum; both forms bind to the same value through Spring's relaxed binding.priority: openai,anthropic,ollama— chat calls tryOpenAIfirst; if the OpenAI client is missing the key or returns an unrecoverable error, the orchestrator falls back toAnthropic; if Anthropic is also unavailable, the call lands on the localOllamaruntime. List order is the failover order, so cheaper / more reliable providers should appear earlier.priority: openai(embedding) —Anthropicdoes not ship an embedding model in Spring AI, and the embedding path is dimension-pinned to1536in the platform's pgvector schema, which matchesOpenAI'stext-embedding-3-small/text-embedding-ada-002. Mixing in an Ollama embedding model (typically 768 / 1024 dimensions) would write vectors of an incompatible width into the same column and break similarity search. Keep embeddings on a single 1536-dimension provider.
Pair the selection block with the Spring AI provider properties
The contexa.llm.selection.* block decides which provider is called, but the actual API keys and model names live under the standard Spring AI property tree:
spring:
ai:
retry:
max-attempts: 1
anthropic:
api-key: ${ANTHROPIC_API_KEY:disabled}
chat:
options:
model: claude-3-sonnet-20240229
openai:
api-key: ${OPENAI_API_KEY:disabled}
base-url: https://api.openai.com
chat:
options:
model: gpt-4o-mini
temperature: 0.3
- API key placeholders — the literal
disableddefault keeps the auto-configuration from registering a real client when the operator has not supplied an environment variable. SettingANTHROPIC_API_KEY/OPENAI_API_KEYactivates the corresponding provider; leaving them asdisabledintentionally skips that branch and lets the next provider in the priority list take over. spring.ai.retry.max-attempts: 1— with two providers chained viadynamic-priority, retrying the same failing provider inside Spring AI is wasted latency.1keeps each call single-shot and lets the CONTEXA fail-over selector pick the next entry.- Model identifiers —
gpt-4o-miniis recommended for Tier 1 (low-latency Layer 1 contextual decisions) andclaude-3-sonnet-20240229for Tier 2 (forensic Layer 2 reasoning). Swap with newer SKUs as they become available. temperature: 0.3on the OpenAI side is intentional: security-decision prompts benefit from low temperature; raising it widens drift in BLOCK / ESCALATE boundaries.
OpenAI and Anthropic provider credentials are configured through Spring AI's own provider properties such as spring.ai.openai.* and spring.ai.anthropic.*; Contexa only selects between the available provider beans.
See also: LLM & Models Reference
RAG Configuration
These properties drive the retrieval path used by the pipeline and AI Lab. The OSS code binds contexa.rag.* to ContexaRagProperties, and binds pgvector settings to spring.ai.vectorstore.pgvector.*.
| Property | Default | Description |
|---|---|---|
contexa.rag.defaults.similarity-threshold | 0.7 | Default similarity threshold for general retrieval |
contexa.rag.defaults.top-k | 10 | Default retrieval size for general retrieval |
contexa.rag.behavior.lookback-days | 30 | Behavioral retrieval lookback window |
contexa.rag.risk.similarity-threshold | 0.8 | Risk-focused retrieval threshold |
contexa.rag.risk.top-k | 50 | Risk-focused retrieval result count |
contexa.rag.lab.batch-size | 50 | AI Lab batch size |
contexa.rag.lab.validation-enabled | true | Enables validation before ingestion |
contexa.rag.lab.enrichment-enabled | true | Enables metadata enrichment during AI Lab processing |
contexa.rag.lab.top-k | 100 | AI Lab retrieval size |
contexa.rag.lab.similarity-threshold | 0.75 | AI Lab similarity threshold |
contexa.rag.etl.batch-size | 100 | ETL ingestion batch size |
contexa.rag.etl.chunk-size | 500 | Chunk size for document splitting |
contexa.rag.etl.chunk-overlap | 50 | Chunk overlap for document splitting |
contexa.rag.etl.vector-table-name | vector_store | Logical target table name used by ETL workflows |
contexa.rag.etl.behavior.retention-days | 90 | Behavioral corpus retention period |
Vector Store (pgvector)
| Property | Default | Description |
|---|---|---|
spring.ai.vectorstore.pgvector.index-type | HNSW | Index implementation |
spring.ai.vectorstore.pgvector.distance-type | COSINE_DISTANCE | Distance metric |
spring.ai.vectorstore.pgvector.dimensions | 1024 | Embedding dimension bound expected by the store |
spring.ai.vectorstore.pgvector.batch-size | 100 | Store write batch size |
spring.ai.vectorstore.pgvector.parallel-threads | 4 | Parallel worker count |
spring.ai.vectorstore.pgvector.top-k | 100 | Default search result count |
spring.ai.vectorstore.pgvector.similarity-threshold | 0.5 | Default search similarity threshold |
spring.ai.vectorstore.pgvector.search-timeout-ms | 10000 | Search timeout |
spring.ai.vectorstore.pgvector.store-timeout-ms | 10000 | Store timeout |
spring.ai.vectorstore.pgvector.hnsw.m | 16 | HNSW graph connectivity |
spring.ai.vectorstore.pgvector.hnsw.ef-construction | 64 | HNSW construction effort |
spring.ai.vectorstore.pgvector.hnsw.ef-search | 100 | HNSW search effort |
spring.ai.vectorstore.pgvector.ivfflat.lists | 100 | IVFFLAT list count |
spring.ai.vectorstore.pgvector.ivfflat.probes | 10 | IVFFLAT probe count |
spring.ai.vectorstore.pgvector.document.chunk-size | 1000 | Document chunk size |
spring.ai.vectorstore.pgvector.document.chunk-overlap | 200 | Document chunk overlap |
spring.ai.vectorstore.pgvector.document.enrich-metadata | true | Metadata enrichment toggle |
spring.ai.vectorstore.pgvector.document.extract-keywords | true | Keyword extraction toggle |
spring.ai.vectorstore.pgvector.document.generate-summary | false | Document summary generation toggle |
contexa:
rag:
defaults:
similarity-threshold: 0.7
top-k: 10
risk:
similarity-threshold: 0.8
top-k: 50
lab:
batch-size: 50
validation-enabled: true
enrichment-enabled: true
spring:
ai:
vectorstore:
pgvector:
dimensions: 1024
index-type: HNSW
distance-type: COSINE_DISTANCE
document:
chunk-size: 1000
chunk-overlap: 200
See also: Pipeline & RAG Reference
Streaming Configuration
| Property | Default | Description |
|---|---|---|
contexa.streaming.final-response-marker | ###FINAL_RESPONSE### | Marker emitted before the final structured response |
contexa.streaming.streaming-marker | ###STREAMING### | Marker prefix for streaming chunks |
contexa.streaming.json-start-marker | ===JSON_START=== | Marker indicating JSON output start |
contexa.streaming.json-end-marker | ===JSON_END=== | Marker indicating JSON output end |
contexa.streaming.timeout | PT5M | Overall streaming timeout |
contexa.streaming.max-retries | 3 | Retry count for stream recovery |
contexa.streaming.retry-delay | PT1S | Initial retry delay |
contexa.streaming.retry-multiplier | 1.5 | Retry backoff multiplier |
contexa.streaming.marker-buffer-size | 100 | Marker parsing buffer size |
contexa.streaming.sentence-buffering-enabled | true | Sentence buffering toggle for chunk smoothing |
contexa:
streaming:
timeout: 5m
max-retries: 3
retry-delay: 1s
retry-multiplier: 1.5
marker-buffer-size: 100
sentence-buffering-enabled: true
See also: Streaming Reference
Advisor Configuration
| Property | Default | Description |
|---|---|---|
contexa.advisor.chain-profile | STANDARD | Named advisor chain profile |
contexa.advisor.security.enabled | true | Enable the security advisor |
contexa.advisor.security.order | 50 | Execution order of the security advisor |
contexa.advisor.security.require-authentication | false | Require an authenticated principal before AI processing |
contexa.advisor.soar.approval.enabled | true | Enable the SOAR approval advisor |
contexa.advisor.soar.approval.order | 100 | Execution order of the SOAR approval advisor |
contexa.advisor.soar.approval.timeout | 300 | Approval timeout in seconds |
contexa:
advisor:
chain-profile: STANDARD
security:
enabled: true
order: 50
require-authentication: false
soar:
approval:
enabled: true
order: 100
timeout: 300
See also: Advisor System Reference
Tuning Scenarios
Common configuration adjustments for specific situations:
Slow Responses
If inference is slow, reduce the model footprint and widen the Layer 1 pipeline budget:
contexa:
llm:
chat:
ollama:
model: qwen2.5:7b
spring:
ai:
security:
layer1:
model: qwen2.5:7b
tiered:
layer1:
timeout:
total-ms: 5400000
llm-ms: 3600000
rag-ms: 1200000
RAG Results Are Inaccurate
If retrieved context is not relevant enough, tighten the default threshold and reduce the retrieval window:
contexa:
rag:
defaults:
similarity-threshold: 0.85
top-k: 3
spring:
ai:
security:
tiered:
layer1:
rag:
similarity-threshold: 0.7
vector-search-limit: 6
Token Usage / Cost Optimization
Reduce prompt size by lowering the Layer 1 prompt and truncation budgets:
spring:
ai:
security:
tiered:
layer1:
prompt:
max-rag-documents: 2
max-similar-events: 1
vector-search-limit: 3
truncation:
layer1:
payload: 100
rag-document: 150
Production Deployment
Recommended settings for a production deployment with separate chat and embedding runtimes:
contexa:
llm:
selection:
chat:
mode: DYNAMIC_PRIORITY
priority: ollama,anthropic,openai
embedding:
mode: DYNAMIC_PRIORITY
priority: ollama,openai
embedding:
ollama:
dedicated-runtime-enabled: true
base-url: http://localhost:11435
advisor:
security:
enabled: true
require-authentication: true
spring:
ai:
security:
layer1:
model: qwen2.5:7b
layer2:
model: gpt-4o-mini
tiered:
vector-cache:
max-size: 50000
expire-minutes: 10
enabled: true
record-stats: true
security:
trusted-proxy-validation-enabled: true
Complete Property Reference
TieredStrategyProperties — Layer 1 Settings
| Property | Type | Default | Description |
|---|---|---|---|
.rag.similarity-threshold | double | 0.5 | Layer 1 RAG similarity threshold |
.session.max-recent-actions | int | 100 | Recent action window |
.cache.max-size | int | 1000 | Layer 1 cache size |
.cache.ttl-minutes | int | 30 | Layer 1 cache TTL |
.timeout.total-ms | long | 5000 | Total Layer 1 budget (ms) |
.timeout.llm-ms | long | 3200 | Layer 1 LLM budget (ms) |
.timeout.rag-ms | long | 900 | Layer 1 RAG budget (ms) |
.vector-search-limit | int | 3 | Vector search result cap |
.default-budget-profile | String | CORTEX_L1_INTERACTIVE_STRICT | Named Layer 1 budget profile |
.prompt.max-similar-events | int | 2 | Prompt similar-event cap |
.prompt.max-rag-documents | int | 3 | Prompt RAG document cap |
.prompt.include-event-id | boolean | false | Include event id in prompt |
.prompt.include-raw-timestamp | boolean | false | Include raw timestamp in prompt |
.prompt.include-raw-session-id | boolean | false | Include raw session id in prompt |
.prompt.include-full-user-agent | boolean | false | Include full user agent in prompt |
TieredStrategyProperties — Layer 2 and Shared Settings
| Property | Type | Default | Description |
|---|---|---|---|
.layer2.rag.similarity-threshold | double | 0.5 | Layer 2 RAG similarity threshold |
.layer2.cache.max-size | int | 1000 | Layer 2 cache size |
.layer2.cache.ttl-minutes | int | 30 | Layer 2 cache TTL |
.layer2.timeout-ms | long | 7000 | Layer 2 timeout (ms) |
.layer2.enable-soar | boolean | false | SOAR activation toggle |
.layer2.rag-top-k | int | 5 | Layer 2 retrieval size |
.layer2.default-budget-profile | String | CORTEX_L2_EXPERT_STRICT | Named Layer 2 budget profile |
.truncation.layer1.user-agent | int | 150 | Layer 1 user-agent truncation |
.truncation.layer1.payload | int | 200 | Layer 1 payload truncation |
.truncation.layer1.rag-document | int | 180 | Layer 1 RAG truncation |
.truncation.layer2.user-agent | int | 150 | Layer 2 user-agent truncation |
.truncation.layer2.payload | int | 1000 | Layer 2 payload truncation |
.truncation.layer2.rag-document | int | 500 | Layer 2 RAG truncation |
.vector-cache.max-size | int | 10000 | Vector cache size |
.vector-cache.expire-minutes | int | 5 | Vector cache TTL |
.vector-cache.enabled | boolean | true | Vector cache toggle |
.vector-cache.record-stats | boolean | true | Vector cache metrics toggle |
.security.trusted-proxies | List<String> | [] | Trusted proxy list |
.security.trusted-proxy-validation-enabled | boolean | true | Trusted proxy validation toggle |
.prompt-compression.enabled | boolean | true | Runtime prompt compression toggle |
.prompt-runtime.native-structured-output-enabled | boolean | true | Enable native structured output |
.prompt-runtime.native-structured-output-disabled-profiles | List<String> | [] | Profiles where native structured output is disabled |
.prompt-runtime.telemetry-enabled | boolean | true | Enable prompt runtime telemetry |
SecurityMappingProperties — spring.ai.security.mapping
| Property | Type | Default | Description |
|---|---|---|---|
.task-to-tier | Map<String, Integer> | {} | Direct task name → tier (1/2/3) mapping |
.task-to-analysis-level | Map<String, String> | {} | Task name → analysis level (QUICK/NORMAL/DEEP) mapping |
.task-configs | Map<String, TaskConfig> | {} | Per-task detailed config (tier, analysisLevel, toolExecutionEnabled, requireFastResponse, preferLocalModel/CloudModel, temperature, timeoutMs, preferredModel, metadata) |
.defaults.tier1-tasks | String[] | [THREAT_FILTERING, QUICK_DETECTION] | Default tier-1 task names |
.defaults.tier2-tasks | String[] | [CONTEXTUAL_ANALYSIS, BEHAVIOR_ANALYSIS, CORRELATION] | Default tier-2 task names |
.defaults.tier3-tasks | String[] | [EXPERT_INVESTIGATION, INCIDENT_RESPONSE, FORENSIC_ANALYSIS, SOAR_AUTOMATION, APPROVAL_WORKFLOW] | Default tier-3 task names |
.defaults.default-tier | Integer | 2 | Default tier for unmapped tasks |
.defaults.default-analysis-level | String | NORMAL | Default analysis level for unmapped tasks |
ContexaLlmBindingProperties — contexa.llm.bindings
| Property | Type | Default | Description |
|---|---|---|---|
.chat | Map<String, Binding> | {} | Chat model bindings (key = logical name) |
.embedding | Map<String, Binding> | {} | Embedding model bindings (key = logical name) |
Binding fields | |||
.bean-name | String | "" | Spring AI bean name to reference |
.provider | String | "" | Provider identifier (ollama/openai/anthropic, etc.) |
.model-id | String | "" | Model identifier (e.g., qwen2.5:7b) |
.aliases | List<String> | [] | Model aliases |
.enabled | boolean | true | Binding enabled flag |
.primary | boolean | false | Primary binding for this kind |
contexa:
llm:
bindings:
chat:
primary-ollama:
bean-name: contexaOllamaChatModel
provider: ollama
model-id: qwen2.5:7b
aliases: [default, fast]
enabled: true
primary: true
embedding:
primary-ollama:
bean-name: contexaSharedOllamaEmbeddingModel
provider: ollama
model-id: mxbai-embed-large
enabled: true
primary: true
Contexa RAG, Advisor, Streaming, and PgVector
| Property | Default | Description |
|---|---|---|
contexa.rag.behavior.lookback-days | 30 | Behavioral retrieval lookback window |
contexa.rag.risk.similarity-threshold | 0.8 | Risk retrieval threshold |
contexa.rag.risk.top-k | 50 | Risk retrieval size |
contexa.rag.etl.batch-size | 100 | ETL batch size |
contexa.rag.etl.chunk-size | 500 | ETL chunk size |
contexa.rag.etl.chunk-overlap | 50 | ETL chunk overlap |
contexa.rag.etl.behavior.retention-days | 90 | Behavior retention period |
contexa.advisor.chain-profile | STANDARD | Advisor chain profile |
contexa.advisor.soar.approval.enabled | true | SOAR approval advisor toggle |
contexa.advisor.soar.approval.order | 100 | SOAR approval advisor order |
contexa.advisor.soar.approval.timeout | 300 | SOAR approval timeout |
contexa.streaming.timeout | PT5M | Streaming timeout |
contexa.streaming.max-retries | 3 | Streaming retry count |
contexa.streaming.retry-delay | PT1S | Streaming retry delay |
contexa.streaming.retry-multiplier | 1.5 | Streaming retry multiplier |
spring.ai.vectorstore.pgvector.dimensions | 1024 | Embedding dimension |
spring.ai.vectorstore.pgvector.parallel-threads | 4 | Parallel worker count |
spring.ai.vectorstore.pgvector.search-timeout-ms | 10000 | Search timeout |
spring.ai.vectorstore.pgvector.store-timeout-ms | 10000 | Store timeout |
spring.ai.vectorstore.pgvector.document.chunk-size | 1000 | Document chunk size |
spring.ai.vectorstore.pgvector.document.chunk-overlap | 200 | Document chunk overlap |