AI Engine Configuration
Configuration properties for the Contexa AI engine, including the tiered LLM strategy, RAG, advisor chain, streaming pipeline, and tuning scenarios.
Quick Start: Minimal Config
The minimum configuration to run the OSS AI path with the built-in Ollama chat runtime and the tiered security models:
contexa:
llm:
chat:
ollama:
base-url: http://localhost:11434
model: qwen2.5:14b
spring:
ai:
security:
layer1:
model: qwen2.5:14b
layer2:
model: exaone3.5:latest
vectorstore:
pgvector:
index-type: HNSW
distance-type: COSINE_DISTANCE
dimensions: 1024
This enables the Ollama chat runtime through contexa.llm.chat.ollama.*, assigns Tier 1 and Tier 2 models, and configures the project-declared pgvector properties. If you want a dedicated embedding runtime, add contexa.llm.embedding.ollama.* as shown below.
Configuration Architecture
The AI runtime is configured through seven property groups. Contexa owns the runtime selection, advisor, RAG, and streaming properties, while Spring AI owns the provider integrations and pgvector base configuration.
| Property Prefix | Bound To | Configures |
|---|---|---|
contexa.llm.* | ContexaProperties | Chat runtime selection, Ollama endpoints, model priority, dedicated embedding runtime |
spring.ai.security.* | TieredLLMProperties | Tier 1 and Tier 2 model selection, backup models, helper timeout accessors |
spring.ai.security.tiered.* | TieredStrategyProperties | Prompt budgets, truncation, vector cache, trusted proxy validation, RAG thresholds |
contexa.advisor.* | ContexaAdvisorProperties | Advisor chain profile, security advisor order, SOAR approval advisor settings |
contexa.rag.* | ContexaRagProperties | Default, behavior, risk, AI Lab, and ETL retrieval settings |
contexa.streaming.* | StreamingProperties | Protocol markers, retry policy, parser buffers, streaming timeout |
spring.ai.vectorstore.pgvector.* | PgVectorStoreProperties | Index type, dimensions, search/store limits, document chunking, HNSW/IVFFLAT tuning |
LLM Model Configuration
The tiered model system is split between runtime selection in contexa.llm.* and tier assignment in spring.ai.security.*. The OSS core ships with Ollama chat runtime wiring and can also use Spring AI OpenAI or Anthropic providers if those beans are present.
| Property | Default | Description |
|---|---|---|
contexa.llm.enabled | true | Master flag for Contexa LLM-dependent features |
contexa.llm.advisor-enabled | true | Enables the advisor chain registration path |
contexa.llm.chat-model-priority | ollama,anthropic,openai | Provider preference order for the primary chat model |
contexa.llm.embedding-model-priority | ollama,openai | Provider preference order for the primary embedding model |
contexa.llm.chat.ollama.base-url | "" | Required to enable the built-in Ollama chat runtime |
contexa.llm.chat.ollama.model | "" | Optional explicit Ollama chat model; if omitted the auto-configuration falls back to qwen3:8b |
contexa.llm.chat.ollama.keep-alive | "" | Optional Ollama keep-alive hint passed to chat options |
contexa.llm.embedding.ollama.dedicated-runtime-enabled | false | Enables a dedicated Ollama embedding runtime instead of reusing the chat runtime |
contexa.llm.embedding.ollama.base-url | "" | Required when the dedicated embedding runtime is enabled |
contexa.llm.embedding.ollama.model | "" | Optional explicit embedding model; if omitted the auto-configuration falls back to mxbai-embed-large |
spring.ai.security.layer1.model | qwen2.5:14b | Tier 1 model selection |
spring.ai.security.layer1.backup.model | — | Tier 1 backup model |
spring.ai.security.layer2.model | exaone3.5:latest | Tier 2 model selection |
spring.ai.security.layer2.backup.model | — | Tier 2 backup model |
spring.ai.security.tiered.layer1.timeout-ms | 30000 | Tier 1 inference timeout used by TieredLLMProperties when the value is unset or invalid |
spring.ai.security.tiered.layer2.timeout-ms | 60000 | Tier 2 inference timeout used by TieredLLMProperties when the value is unset or invalid |
contexa:
llm:
chat-model-priority: ollama,anthropic,openai
embedding-model-priority: ollama,openai
chat:
ollama:
base-url: http://localhost:11434
model: qwen2.5:14b
embedding:
ollama:
dedicated-runtime-enabled: true
base-url: http://localhost:11435
model: mxbai-embed-large
spring:
ai:
security:
layer1:
model: qwen2.5:14b
backup:
model: llama3.2:latest
layer2:
model: exaone3.5:latest
backup:
model: deepseek-r1:14b
tiered:
layer1:
timeout-ms: 30000
layer2:
timeout-ms: 60000
OpenAI and Anthropic provider credentials are configured through Spring AI's own provider properties such as spring.ai.openai.* and spring.ai.anthropic.*; Contexa only selects between the available provider beans.
See also: LLM & Models Reference
RAG Configuration
These properties drive the retrieval path used by the pipeline and AI Lab. The OSS code binds contexa.rag.* to ContexaRagProperties, and binds pgvector settings to spring.ai.vectorstore.pgvector.*.
| Property | Default | Description |
|---|---|---|
contexa.rag.defaults.similarity-threshold | 0.7 | Default similarity threshold for general retrieval |
contexa.rag.defaults.top-k | 10 | Default retrieval size for general retrieval |
contexa.rag.behavior.lookback-days | 30 | Behavioral retrieval lookback window |
contexa.rag.risk.similarity-threshold | 0.8 | Risk-focused retrieval threshold |
contexa.rag.risk.top-k | 50 | Risk-focused retrieval result count |
contexa.rag.lab.batch-size | 50 | AI Lab batch size |
contexa.rag.lab.validation-enabled | true | Enables validation before ingestion |
contexa.rag.lab.enrichment-enabled | true | Enables metadata enrichment during AI Lab processing |
contexa.rag.lab.top-k | 100 | AI Lab retrieval size |
contexa.rag.lab.similarity-threshold | 0.75 | AI Lab similarity threshold |
contexa.rag.etl.batch-size | 100 | ETL ingestion batch size |
contexa.rag.etl.chunk-size | 500 | Chunk size for document splitting |
contexa.rag.etl.chunk-overlap | 50 | Chunk overlap for document splitting |
contexa.rag.etl.vector-table-name | vector_store | Logical target table name used by ETL workflows |
contexa.rag.etl.behavior.retention-days | 90 | Behavioral corpus retention period |
Vector Store (pgvector)
| Property | Default | Description |
|---|---|---|
spring.ai.vectorstore.pgvector.index-type | HNSW | Index implementation |
spring.ai.vectorstore.pgvector.distance-type | COSINE_DISTANCE | Distance metric |
spring.ai.vectorstore.pgvector.dimensions | 1024 | Embedding dimension bound expected by the store |
spring.ai.vectorstore.pgvector.batch-size | 100 | Store write batch size |
spring.ai.vectorstore.pgvector.parallel-threads | 4 | Parallel worker count |
spring.ai.vectorstore.pgvector.top-k | 100 | Default search result count |
spring.ai.vectorstore.pgvector.similarity-threshold | 0.5 | Default search similarity threshold |
spring.ai.vectorstore.pgvector.search-timeout-ms | 10000 | Search timeout |
spring.ai.vectorstore.pgvector.store-timeout-ms | 10000 | Store timeout |
spring.ai.vectorstore.pgvector.hnsw.m | 16 | HNSW graph connectivity |
spring.ai.vectorstore.pgvector.hnsw.ef-construction | 64 | HNSW construction effort |
spring.ai.vectorstore.pgvector.hnsw.ef-search | 100 | HNSW search effort |
spring.ai.vectorstore.pgvector.ivfflat.lists | 100 | IVFFLAT list count |
spring.ai.vectorstore.pgvector.ivfflat.probes | 10 | IVFFLAT probe count |
spring.ai.vectorstore.pgvector.document.chunk-size | 1000 | Document chunk size |
spring.ai.vectorstore.pgvector.document.chunk-overlap | 200 | Document chunk overlap |
spring.ai.vectorstore.pgvector.document.enrich-metadata | true | Metadata enrichment toggle |
spring.ai.vectorstore.pgvector.document.extract-keywords | true | Keyword extraction toggle |
spring.ai.vectorstore.pgvector.document.generate-summary | false | Document summary generation toggle |
contexa:
rag:
defaults:
similarity-threshold: 0.7
top-k: 10
risk:
similarity-threshold: 0.8
top-k: 50
lab:
batch-size: 50
validation-enabled: true
enrichment-enabled: true
spring:
ai:
vectorstore:
pgvector:
dimensions: 1024
index-type: HNSW
distance-type: COSINE_DISTANCE
document:
chunk-size: 1000
chunk-overlap: 200
See also: Pipeline & RAG Reference
Streaming Configuration
| Property | Default | Description |
|---|---|---|
contexa.streaming.final-response-marker | ###FINAL_RESPONSE### | Marker emitted before the final structured response |
contexa.streaming.streaming-marker | ###STREAMING### | Marker prefix for streaming chunks |
contexa.streaming.json-start-marker | ===JSON_START=== | Marker indicating JSON output start |
contexa.streaming.json-end-marker | ===JSON_END=== | Marker indicating JSON output end |
contexa.streaming.timeout | PT5M | Overall streaming timeout |
contexa.streaming.max-retries | 3 | Retry count for stream recovery |
contexa.streaming.retry-delay | PT1S | Initial retry delay |
contexa.streaming.retry-multiplier | 1.5 | Retry backoff multiplier |
contexa.streaming.marker-buffer-size | 100 | Marker parsing buffer size |
contexa.streaming.sentence-buffering-enabled | true | Sentence buffering toggle for chunk smoothing |
contexa:
streaming:
timeout: 5m
max-retries: 3
retry-delay: 1s
retry-multiplier: 1.5
marker-buffer-size: 100
sentence-buffering-enabled: true
See also: Streaming Reference
Advisor Configuration
| Property | Default | Description |
|---|---|---|
contexa.advisor.chain-profile | STANDARD | Named advisor chain profile |
contexa.advisor.security.enabled | true | Enable the security advisor |
contexa.advisor.security.order | 50 | Execution order of the security advisor |
contexa.advisor.security.require-authentication | false | Require an authenticated principal before AI processing |
contexa.advisor.soar.approval.enabled | true | Enable the SOAR approval advisor |
contexa.advisor.soar.approval.order | 100 | Execution order of the SOAR approval advisor |
contexa.advisor.soar.approval.timeout | 300 | Approval timeout in seconds |
contexa:
advisor:
chain-profile: STANDARD
security:
enabled: true
order: 50
require-authentication: false
soar:
approval:
enabled: true
order: 100
timeout: 300
See also: Advisor System Reference
Tuning Scenarios
Common configuration adjustments for specific situations:
Slow Responses
If inference is slow, reduce the model footprint and widen the Layer 1 pipeline budget:
contexa:
llm:
chat:
ollama:
model: qwen2.5:7b
spring:
ai:
security:
layer1:
model: qwen2.5:7b
tiered:
layer1:
timeout:
total-ms: 5400000
llm-ms: 3600000
rag-ms: 1200000
RAG Results Are Inaccurate
If retrieved context is not relevant enough, tighten the default threshold and reduce the retrieval window:
contexa:
rag:
defaults:
similarity-threshold: 0.85
top-k: 3
spring:
ai:
security:
tiered:
layer1:
rag:
similarity-threshold: 0.7
vector-search-limit: 6
Token Usage / Cost Optimization
Reduce prompt size by lowering the Layer 1 prompt and truncation budgets:
spring:
ai:
security:
tiered:
layer1:
prompt:
max-rag-documents: 2
max-similar-events: 1
vector-search-limit: 3
truncation:
layer1:
payload: 100
rag-document: 150
Production Deployment
Recommended settings for a production deployment with separate chat and embedding runtimes:
contexa:
llm:
chat-model-priority: ollama,anthropic,openai
embedding-model-priority: ollama,openai
embedding:
ollama:
dedicated-runtime-enabled: true
base-url: http://localhost:11435
advisor:
security:
enabled: true
require-authentication: true
spring:
ai:
security:
layer1:
model: qwen2.5:14b
layer2:
model: exaone3.5:latest
tiered:
vector-cache:
max-size: 50000
expire-minutes: 10
enabled: true
record-stats: true
security:
trusted-proxy-validation-enabled: true
Complete Property Reference
TieredStrategyProperties — Layer 1 Settings
| Property | Type | Default | Description |
|---|---|---|---|
.rag.similarity-threshold | double | 0.5 | Layer 1 RAG similarity threshold |
.session.max-recent-actions | int | 100 | Recent action window |
.cache.max-size | int | 1000 | Layer 1 cache size |
.cache.ttl-minutes | int | 30 | Layer 1 cache TTL |
.timeout.total-ms | long | 4500000 | Total Layer 1 budget |
.timeout.llm-ms | long | 3000000 | Layer 1 LLM budget |
.timeout.rag-ms | long | 1000000 | Layer 1 RAG budget |
.vector-search-limit | int | 12 | Vector search result cap |
.default-budget-profile | String | CORTEX_L1_STANDARD | Named Layer 1 budget profile |
.prompt.max-similar-events | int | 3 | Prompt similar-event cap |
.prompt.max-rag-documents | int | 12 | Prompt RAG document cap |
.prompt.include-event-id | boolean | false | Include event id in prompt |
.prompt.include-raw-timestamp | boolean | false | Include raw timestamp in prompt |
.prompt.include-raw-session-id | boolean | false | Include raw session id in prompt |
.prompt.include-full-user-agent | boolean | false | Include full user agent in prompt |
TieredStrategyProperties — Layer 2 and Shared Settings
| Property | Type | Default | Description |
|---|---|---|---|
.layer2.rag.similarity-threshold | double | 0.5 | Layer 2 RAG similarity threshold |
.layer2.cache.max-size | int | 1000 | Layer 2 cache size |
.layer2.cache.ttl-minutes | int | 30 | Layer 2 cache TTL |
.layer2.timeout-ms | long | 100000 | Layer 2 timeout |
.layer2.enable-soar | boolean | false | SOAR activation toggle |
.layer2.rag-top-k | int | 10 | Layer 2 retrieval size |
.layer2.default-budget-profile | String | CORTEX_L2_STANDARD | Named Layer 2 budget profile |
.truncation.layer1.user-agent | int | 150 | Layer 1 user-agent truncation |
.truncation.layer1.payload | int | 200 | Layer 1 payload truncation |
.truncation.layer1.rag-document | int | 300 | Layer 1 RAG truncation |
.truncation.layer2.user-agent | int | 150 | Layer 2 user-agent truncation |
.truncation.layer2.payload | int | 1000 | Layer 2 payload truncation |
.truncation.layer2.rag-document | int | 500 | Layer 2 RAG truncation |
.vector-cache.max-size | int | 10000 | Vector cache size |
.vector-cache.expire-minutes | int | 5 | Vector cache TTL |
.vector-cache.enabled | boolean | true | Vector cache toggle |
.vector-cache.record-stats | boolean | true | Vector cache metrics toggle |
.security.trusted-proxies | List<String> | [] | Trusted proxy list |
.security.trusted-proxy-validation-enabled | boolean | true | Trusted proxy validation toggle |
.prompt-compression.enabled | boolean | true | Runtime prompt compression toggle |
Contexa RAG, Advisor, Streaming, and PgVector
| Property | Default | Description |
|---|---|---|
contexa.rag.behavior.lookback-days | 30 | Behavioral retrieval lookback window |
contexa.rag.risk.similarity-threshold | 0.8 | Risk retrieval threshold |
contexa.rag.risk.top-k | 50 | Risk retrieval size |
contexa.rag.etl.batch-size | 100 | ETL batch size |
contexa.rag.etl.chunk-size | 500 | ETL chunk size |
contexa.rag.etl.chunk-overlap | 50 | ETL chunk overlap |
contexa.rag.etl.behavior.retention-days | 90 | Behavior retention period |
contexa.advisor.chain-profile | STANDARD | Advisor chain profile |
contexa.advisor.soar.approval.enabled | true | SOAR approval advisor toggle |
contexa.advisor.soar.approval.order | 100 | SOAR approval advisor order |
contexa.advisor.soar.approval.timeout | 300 | SOAR approval timeout |
contexa.streaming.timeout | PT5M | Streaming timeout |
contexa.streaming.max-retries | 3 | Streaming retry count |
contexa.streaming.retry-delay | PT1S | Streaming retry delay |
contexa.streaming.retry-multiplier | 1.5 | Streaming retry multiplier |
spring.ai.vectorstore.pgvector.dimensions | 1024 | Embedding dimension |
spring.ai.vectorstore.pgvector.parallel-threads | 4 | Parallel worker count |
spring.ai.vectorstore.pgvector.search-timeout-ms | 10000 | Search timeout |
spring.ai.vectorstore.pgvector.store-timeout-ms | 10000 | Store timeout |
spring.ai.vectorstore.pgvector.document.chunk-size | 1000 | Document chunk size |
spring.ai.vectorstore.pgvector.document.chunk-overlap | 200 | Document chunk overlap |