AI Engine Configuration

Quick Start: Minimal Config

The minimum configuration to run the OSS AI path with the built-in Ollama chat runtime and the tiered security models:

YAML

contexa:
  llm:
    chat:
      ollama:
        base-url: http://localhost:11434
        model: qwen2.5:7b
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
      layer2:
        model: gpt-4o-mini
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1024

This enables the Ollama chat runtime through contexa.llm.chat.ollama.*, assigns Tier 1 and Tier 2 models, and configures the project-declared pgvector properties. If you want a dedicated embedding runtime, add contexa.llm.embedding.ollama.* as shown below.

Configuration Architecture

The AI runtime is configured through seven property groups. Contexa owns the runtime selection, advisor, RAG, and streaming properties, while Spring AI owns the provider integrations and pgvector base configuration.

Property Prefix	Bound To	Configures
`contexa.llm.*`	`ContexaProperties`	Chat runtime selection, Ollama endpoints, model priority, dedicated embedding runtime
`spring.ai.security.*`	`TieredLLMProperties`	Tier 1 and Tier 2 model selection, backup models, helper timeout accessors
`contexa.security.tiered.*`	`TieredStrategyProperties`	Prompt budgets, truncation, vector cache, trusted proxy validation, RAG thresholds
`contexa.advisor.*`	`ContexaAdvisorProperties`	Advisor chain profile, security advisor order, SOAR approval advisor settings
`contexa.rag.*`	`ContexaRagProperties`	Default, behavior, risk, AI Lab, and ETL retrieval settings
`contexa.streaming.*`	`StreamingProperties`	Protocol markers, retry policy, parser buffers, streaming timeout
`contexa.vectorstore.pgvector.*`	`PgVectorStoreProperties`	Index type, dimensions, search/store limits, document chunking, HNSW/IVFFLAT tuning

LLM Model Configuration

The tiered model system is split between runtime selection in contexa.llm.* and tier assignment in spring.ai.security.*. The OSS core ships with Ollama chat runtime wiring and can also use Spring AI OpenAI or Anthropic providers if those beans are present.

Property	Default	Description
`contexa.llm.enabled`	`true`	Master flag for Contexa LLM-dependent features
`contexa.llm.advisor-enabled`	`true`	Enables the advisor chain registration path
`contexa.llm.selection.chat.mode`	`DYNAMIC_PRIORITY`	Chat provider selection strategy: `DYNAMIC_PRIORITY` walks the priority list, `SPRING_PRIMARY` uses the Spring `@Primary` bean
`contexa.llm.selection.chat.priority`	`""`	Comma-separated chat provider order (e.g., `ollama,anthropic,openai`)
`contexa.llm.selection.embedding.mode`	`DYNAMIC_PRIORITY`	Embedding provider selection strategy
`contexa.llm.selection.embedding.priority`	`""`	Comma-separated embedding provider order (e.g., `ollama,openai`)
`contexa.llm.chat.ollama.base-url`	`""`	Required to enable the built-in Ollama chat runtime
`contexa.llm.chat.ollama.model`	`""`	Optional explicit Ollama chat model; if omitted the auto-configuration falls back to `qwen2.5:7b`
`contexa.llm.chat.ollama.keep-alive`	`""`	Optional Ollama keep-alive hint passed to chat options
`contexa.llm.embedding.ollama.dedicated-runtime-enabled`	`false`	Enables a dedicated Ollama embedding runtime instead of reusing the chat runtime
`contexa.llm.embedding.ollama.base-url`	`""`	Required when the dedicated embedding runtime is enabled
`contexa.llm.embedding.ollama.model`	`""`	Optional explicit embedding model; if omitted the auto-configuration falls back to `mxbai-embed-large`
`contexa.security.tiered.llm.layer1.model`	`qwen2.5:7b`	Tier 1 model selection
`contexa.security.tiered.llm.layer1.backup.model`	—	Tier 1 backup model
`contexa.security.tiered.llm.layer2.model`	`gpt-4o-mini`	Tier 2 model selection
`contexa.security.tiered.llm.layer2.backup.model`	—	Tier 2 backup model
`contexa.security.tiered.layer1.timeout-ms`	`30000`	Tier 1 inference timeout used by `TieredLLMProperties` when the value is unset or invalid
`contexa.security.tiered.layer2.timeout-ms`	`60000`	Tier 2 inference timeout used by `TieredLLMProperties` when the value is unset or invalid

YAML

contexa:
  llm:
    selection:
      chat:
        mode: DYNAMIC_PRIORITY
        priority: ollama,anthropic,openai
      embedding:
        mode: DYNAMIC_PRIORITY
        priority: ollama,openai
    chat:
      ollama:
        base-url: http://localhost:11434
        model: qwen2.5:7b
    embedding:
      ollama:
        dedicated-runtime-enabled: true
        base-url: http://localhost:11435
        model: mxbai-embed-large
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
        backup:
          model: llama3.2:latest
      layer2:
        model: gpt-4o-mini
        backup:
          model: deepseek-r1:14b
      tiered:
        layer1:
          timeout-ms: 30000
        layer2:
          timeout-ms: 60000

Recommended selection (OpenAI + Anthropic with Ollama failover)

The platform can chain managed cloud providers with the local Ollama runtime so that chat calls degrade gracefully when an API key is missing or a cloud endpoint is unreachable. The example below uses OpenAI as the primary chat provider, Anthropic as the second-tier failover, and the local Ollama runtime as the offline / no-key fallback. Embedding always goes through OpenAI.

YAML

contexa:
  llm:
    selection:
      chat:
        mode: dynamic-priority
        priority: openai,anthropic,ollama
      embedding:
        mode: dynamic-priority
        priority: openai

Key meaning

mode: dynamic-priority — the orchestrator walks the comma-separated priority list in order and selects the first provider whose ChatModel / EmbeddingModel bean is present and reachable. The runtime form dynamic-priority is the kebab-case spelling of the Mode.DYNAMIC_PRIORITY enum; both forms bind to the same value through Spring's relaxed binding.
priority: openai,anthropic,ollama — chat calls try OpenAI first; if the OpenAI client is missing the key or returns an unrecoverable error, the orchestrator falls back to Anthropic; if Anthropic is also unavailable, the call lands on the local Ollama runtime. List order is the failover order, so cheaper / more reliable providers should appear earlier.
priority: openai (embedding) — Anthropic does not ship an embedding model in Spring AI, and the embedding path is dimension-pinned to 1536 in the platform's pgvector schema, which matches OpenAI's text-embedding-3-small / text-embedding-ada-002. Mixing in an Ollama embedding model (typically 768 / 1024 dimensions) would write vectors of an incompatible width into the same column and break similarity search. Keep embeddings on a single 1536-dimension provider.

Pair the selection block with the Spring AI provider properties

The contexa.llm.selection.* block decides which provider is called, but the actual API keys and model names live under the standard Spring AI property tree:

YAML

spring:
  ai:
    retry:
      max-attempts: 1
    anthropic:
      api-key: ${ANTHROPIC_API_KEY:disabled}
      chat:
        options:
          model: claude-3-sonnet-20240229
    openai:
      api-key: ${OPENAI_API_KEY:disabled}
      base-url: https://api.openai.com
      chat:
        options:
          model: gpt-4o-mini
          temperature: 0.3

API key placeholders — the literal disabled default keeps the auto-configuration from registering a real client when the operator has not supplied an environment variable. Setting ANTHROPIC_API_KEY / OPENAI_API_KEY activates the corresponding provider; leaving them as disabled intentionally skips that branch and lets the next provider in the priority list take over.
spring.ai.retry.max-attempts: 1 — with two providers chained via dynamic-priority, retrying the same failing provider inside Spring AI is wasted latency. 1 keeps each call single-shot and lets the CONTEXA fail-over selector pick the next entry.
Model identifiers — gpt-4o-mini is recommended for Tier 1 (low-latency Layer 1 contextual decisions) and claude-3-sonnet-20240229 for Tier 2 (forensic Layer 2 reasoning). Swap with newer SKUs as they become available.
temperature: 0.3 on the OpenAI side is intentional: security-decision prompts benefit from low temperature; raising it widens drift in BLOCK / ESCALATE boundaries.

OpenAI and Anthropic provider credentials are configured through Spring AI's own provider properties such as spring.ai.openai.* and spring.ai.anthropic.*; Contexa only selects between the available provider beans.

RAG Configuration

These properties drive the retrieval path used by the pipeline and AI Lab. The OSS code binds contexa.rag.* to ContexaRagProperties, and binds pgvector settings to contexa.vectorstore.pgvector.*.

Property	Default	Description
`contexa.rag.defaults.similarity-threshold`	`0.7`	Default similarity threshold for general retrieval
`contexa.rag.defaults.top-k`	`10`	Default retrieval size for general retrieval
`contexa.rag.behavior.lookback-days`	`30`	Behavioral retrieval lookback window
`contexa.rag.risk.similarity-threshold`	`0.8`	Risk-focused retrieval threshold
`contexa.rag.risk.top-k`	`50`	Risk-focused retrieval result count
`contexa.rag.lab.batch-size`	`50`	AI Lab batch size
`contexa.rag.lab.validation-enabled`	`true`	Enables validation before ingestion
`contexa.rag.lab.enrichment-enabled`	`true`	Enables metadata enrichment during AI Lab processing
`contexa.rag.lab.top-k`	`100`	AI Lab retrieval size
`contexa.rag.lab.similarity-threshold`	`0.75`	AI Lab similarity threshold
`contexa.rag.etl.batch-size`	`100`	ETL ingestion batch size
`contexa.rag.etl.chunk-size`	`500`	Chunk size for document splitting
`contexa.rag.etl.chunk-overlap`	`50`	Chunk overlap for document splitting
`contexa.rag.etl.vector-table-name`	`vector_store`	Logical target table name used by ETL workflows
`contexa.rag.etl.behavior.retention-days`	`90`	Behavioral corpus retention period

Vector Store (pgvector)

Property	Default	Description
`contexa.vectorstore.pgvector.index-type`	`HNSW`	Index implementation
`contexa.vectorstore.pgvector.distance-type`	`COSINE_DISTANCE`	Distance metric
`contexa.vectorstore.pgvector.dimensions`	`1024`	Embedding dimension bound expected by the store
`contexa.vectorstore.pgvector.batch-size`	`100`	Store write batch size
`contexa.vectorstore.pgvector.parallel-threads`	`4`	Parallel worker count
`contexa.vectorstore.pgvector.top-k`	`100`	Default search result count
`contexa.vectorstore.pgvector.similarity-threshold`	`0.5`	Default search similarity threshold
`contexa.vectorstore.pgvector.search-timeout-ms`	`10000`	Search timeout
`contexa.vectorstore.pgvector.store-timeout-ms`	`10000`	Store timeout
`contexa.vectorstore.pgvector.hnsw.m`	`16`	HNSW graph connectivity
`contexa.vectorstore.pgvector.hnsw.ef-construction`	`64`	HNSW construction effort
`contexa.vectorstore.pgvector.hnsw.ef-search`	`100`	HNSW search effort
`contexa.vectorstore.pgvector.ivfflat.lists`	`100`	IVFFLAT list count
`contexa.vectorstore.pgvector.ivfflat.probes`	`10`	IVFFLAT probe count
`contexa.vectorstore.pgvector.document.chunk-size`	`1000`	Document chunk size
`contexa.vectorstore.pgvector.document.chunk-overlap`	`200`	Document chunk overlap
`contexa.vectorstore.pgvector.document.enrich-metadata`	`true`	Metadata enrichment toggle
`contexa.vectorstore.pgvector.document.extract-keywords`	`true`	Keyword extraction toggle
`contexa.vectorstore.pgvector.document.generate-summary`	`false`	Document summary generation toggle

YAML

contexa:
  rag:
    defaults:
      similarity-threshold: 0.7
      top-k: 10
    risk:
      similarity-threshold: 0.8
      top-k: 50
    lab:
      batch-size: 50
      validation-enabled: true
      enrichment-enabled: true
spring:
  ai:
    vectorstore:
      pgvector:
        dimensions: 1024
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        document:
          chunk-size: 1000
          chunk-overlap: 200

Streaming Configuration

Property	Default	Description
`contexa.streaming.final-response-marker`	`###FINAL_RESPONSE###`	Marker emitted before the final structured response
`contexa.streaming.streaming-marker`	`###STREAMING###`	Marker prefix for streaming chunks
`contexa.streaming.json-start-marker`	`===JSON_START===`	Marker indicating JSON output start
`contexa.streaming.json-end-marker`	`===JSON_END===`	Marker indicating JSON output end
`contexa.streaming.timeout`	`PT5M`	Overall streaming timeout
`contexa.streaming.max-retries`	`3`	Retry count for stream recovery
`contexa.streaming.retry-delay`	`PT1S`	Initial retry delay
`contexa.streaming.retry-multiplier`	`1.5`	Retry backoff multiplier
`contexa.streaming.marker-buffer-size`	`100`	Marker parsing buffer size
`contexa.streaming.sentence-buffering-enabled`	`true`	Sentence buffering toggle for chunk smoothing

YAML

contexa:
  streaming:
    timeout: 5m
    max-retries: 3
    retry-delay: 1s
    retry-multiplier: 1.5
    marker-buffer-size: 100
    sentence-buffering-enabled: true

Advisor Configuration

Property	Default	Description
`contexa.advisor.chain-profile`	`STANDARD`	Named advisor chain profile
`contexa.advisor.security.enabled`	`true`	Enable the security advisor
`contexa.advisor.security.order`	`50`	Execution order of the security advisor
`contexa.advisor.security.require-authentication`	`false`	Require an authenticated principal before AI processing
`contexa.advisor.soar.approval.enabled`	`true`	Enable the SOAR approval advisor
`contexa.advisor.soar.approval.order`	`100`	Execution order of the SOAR approval advisor
`contexa.advisor.soar.approval.timeout`	`300`	Approval timeout in seconds

YAML

contexa:
  advisor:
    chain-profile: STANDARD
    security:
      enabled: true
      order: 50
      require-authentication: false
    soar:
      approval:
        enabled: true
        order: 100
        timeout: 300

Tuning Scenarios

Common configuration adjustments for specific situations:

Slow Responses

If inference is slow, reduce the model footprint and widen the Layer 1 pipeline budget:

YAML

contexa:
  llm:
    chat:
      ollama:
        model: qwen2.5:7b
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
      tiered:
        layer1:
          timeout:
            total-ms: 5400000
            llm-ms: 3600000
            rag-ms: 1200000

RAG Results Are Inaccurate

If retrieved context is not relevant enough, tighten the default threshold and reduce the retrieval window:

YAML

contexa:
  rag:
    defaults:
      similarity-threshold: 0.85
      top-k: 3
spring:
  ai:
    security:
      tiered:
        layer1:
          rag:
            similarity-threshold: 0.7
          vector-search-limit: 6

Token Usage / Cost Optimization

Reduce prompt size by lowering the Layer 1 prompt and truncation budgets:

YAML

spring:
  ai:
    security:
      tiered:
        layer1:
          prompt:
            max-rag-documents: 2
            max-similar-events: 1
          vector-search-limit: 3
        truncation:
          layer1:
            payload: 100
            rag-document: 150

Production Deployment

Recommended settings for a production deployment with separate chat and embedding runtimes:

YAML

contexa:
  llm:
    selection:
      chat:
        mode: DYNAMIC_PRIORITY
        priority: ollama,anthropic,openai
      embedding:
        mode: DYNAMIC_PRIORITY
        priority: ollama,openai
    embedding:
      ollama:
        dedicated-runtime-enabled: true
        base-url: http://localhost:11435
  advisor:
    security:
      enabled: true
      require-authentication: true
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
      layer2:
        model: gpt-4o-mini
      tiered:
        vector-cache:
          max-size: 50000
          expire-minutes: 10
          enabled: true
          record-stats: true
        security:
          trusted-proxy-validation-enabled: true

Complete Property Reference

TieredStrategyProperties — Layer 1 Settings

Property	Type	Default	Description
`.rag.similarity-threshold`	`double`	`0.5`	Layer 1 RAG similarity threshold
`.session.max-recent-actions`	`int`	`100`	Recent action window
`.cache.max-size`	`int`	`1000`	Layer 1 cache size
`.cache.ttl-minutes`	`int`	`30`	Layer 1 cache TTL
`.timeout.total-ms`	`long`	`5000`	Total Layer 1 budget (ms)
`.timeout.llm-ms`	`long`	`3200`	Layer 1 LLM budget (ms)
`.timeout.rag-ms`	`long`	`900`	Layer 1 RAG budget (ms)
`.vector-search-limit`	`int`	`3`	Vector search result cap
`.default-budget-profile`	`String`	`CORTEX_L1_INTERACTIVE_STRICT`	Named Layer 1 budget profile
`.prompt.max-similar-events`	`int`	`2`	Prompt similar-event cap
`.prompt.max-rag-documents`	`int`	`3`	Prompt RAG document cap
`.prompt.include-event-id`	`boolean`	`false`	Include event id in prompt
`.prompt.include-raw-timestamp`	`boolean`	`false`	Include raw timestamp in prompt
`.prompt.include-raw-session-id`	`boolean`	`false`	Include raw session id in prompt
`.prompt.include-full-user-agent`	`boolean`	`false`	Include full user agent in prompt

TieredStrategyProperties — Layer 2 and Shared Settings

Property	Type	Default	Description
`.layer2.rag.similarity-threshold`	`double`	`0.5`	Layer 2 RAG similarity threshold
`.layer2.cache.max-size`	`int`	`1000`	Layer 2 cache size
`.layer2.cache.ttl-minutes`	`int`	`30`	Layer 2 cache TTL
`.layer2.timeout-ms`	`long`	`7000`	Layer 2 timeout (ms)
`.layer2.enable-soar`	`boolean`	`false`	SOAR activation toggle
`.layer2.rag-top-k`	`int`	`5`	Layer 2 retrieval size
`.layer2.default-budget-profile`	`String`	`CORTEX_L2_EXPERT_STRICT`	Named Layer 2 budget profile
`.truncation.layer1.user-agent`	`int`	`150`	Layer 1 user-agent truncation
`.truncation.layer1.payload`	`int`	`200`	Layer 1 payload truncation
`.truncation.layer1.rag-document`	`int`	`180`	Layer 1 RAG truncation
`.truncation.layer2.user-agent`	`int`	`150`	Layer 2 user-agent truncation
`.truncation.layer2.payload`	`int`	`1000`	Layer 2 payload truncation
`.truncation.layer2.rag-document`	`int`	`500`	Layer 2 RAG truncation
`.vector-cache.max-size`	`int`	`10000`	Vector cache size
`.vector-cache.expire-minutes`	`int`	`5`	Vector cache TTL
`.vector-cache.enabled`	`boolean`	`true`	Vector cache toggle
`.vector-cache.record-stats`	`boolean`	`true`	Vector cache metrics toggle
`.security.trusted-proxies`	`List<String>`	`[]`	Trusted proxy list
`.security.trusted-proxy-validation-enabled`	`boolean`	`true`	Trusted proxy validation toggle
`.prompt-compression.enabled`	`boolean`	`true`	Runtime prompt compression toggle
`.prompt-runtime.native-structured-output-enabled`	`boolean`	`true`	Enable native structured output
`.prompt-runtime.native-structured-output-disabled-profiles`	`List<String>`	`[]`	Profiles where native structured output is disabled
`.prompt-runtime.telemetry-enabled`	`boolean`	`true`	Enable prompt runtime telemetry

SecurityMappingProperties — contexa.security.mapping

Property	Type	Default	Description
`.task-to-tier`	`Map<String, Integer>`	`{}`	Direct task name → tier (1/2/3) mapping
`.task-to-analysis-level`	`Map<String, String>`	`{}`	Task name → analysis level (`QUICK`/`NORMAL`/`DEEP`) mapping
`.task-configs`	`Map<String, TaskConfig>`	`{}`	Per-task detailed config (tier, analysisLevel, toolExecutionEnabled, requireFastResponse, preferLocalModel/CloudModel, temperature, timeoutMs, preferredModel, metadata)
`.defaults.tier1-tasks`	`String[]`	`[THREAT_FILTERING, QUICK_DETECTION]`	Default tier-1 task names
`.defaults.tier2-tasks`	`String[]`	`[CONTEXTUAL_ANALYSIS, BEHAVIOR_ANALYSIS, CORRELATION]`	Default tier-2 task names
`.defaults.tier3-tasks`	`String[]`	`[EXPERT_INVESTIGATION, INCIDENT_RESPONSE, FORENSIC_ANALYSIS, SOAR_AUTOMATION, APPROVAL_WORKFLOW]`	Default tier-3 task names
`.defaults.default-tier`	`Integer`	`2`	Default tier for unmapped tasks
`.defaults.default-analysis-level`	`String`	`NORMAL`	Default analysis level for unmapped tasks

ContexaLlmBindingProperties — contexa.llm.bindings

Property	Type	Default	Description
`.chat`	`Map<String, Binding>`	`{}`	Chat model bindings (key = logical name)
`.embedding`	`Map<String, Binding>`	`{}`	Embedding model bindings (key = logical name)
`Binding fields`
`.bean-name`	`String`	`""`	Spring AI bean name to reference
`.provider`	`String`	`""`	Provider identifier (`ollama`/`openai`/`anthropic`, etc.)
`.model-id`	`String`	`""`	Model identifier (e.g., `qwen2.5:7b`)
`.aliases`	`List<String>`	`[]`	Model aliases
`.enabled`	`boolean`	`true`	Binding enabled flag
`.primary`	`boolean`	`false`	Primary binding for this kind

YAML

contexa:
  llm:
    bindings:
      chat:
        primary-ollama:
          bean-name: contexaOllamaChatModel
          provider: ollama
          model-id: qwen2.5:7b
          aliases: [default, fast]
          enabled: true
          primary: true
      embedding:
        primary-ollama:
          bean-name: contexaSharedOllamaEmbeddingModel
          provider: ollama
          model-id: mxbai-embed-large
          enabled: true
          primary: true

Contexa RAG, Advisor, Streaming, and PgVector

Property	Default	Description
`contexa.rag.behavior.lookback-days`	`30`	Behavioral retrieval lookback window
`contexa.rag.risk.similarity-threshold`	`0.8`	Risk retrieval threshold
`contexa.rag.risk.top-k`	`50`	Risk retrieval size
`contexa.rag.etl.batch-size`	`100`	ETL batch size
`contexa.rag.etl.chunk-size`	`500`	ETL chunk size
`contexa.rag.etl.chunk-overlap`	`50`	ETL chunk overlap
`contexa.rag.etl.behavior.retention-days`	`90`	Behavior retention period
`contexa.advisor.chain-profile`	`STANDARD`	Advisor chain profile
`contexa.advisor.soar.approval.enabled`	`true`	SOAR approval advisor toggle
`contexa.advisor.soar.approval.order`	`100`	SOAR approval advisor order
`contexa.advisor.soar.approval.timeout`	`300`	SOAR approval timeout
`contexa.streaming.timeout`	`PT5M`	Streaming timeout
`contexa.streaming.max-retries`	`3`	Streaming retry count
`contexa.streaming.retry-delay`	`PT1S`	Streaming retry delay
`contexa.streaming.retry-multiplier`	`1.5`	Streaming retry multiplier
`contexa.vectorstore.pgvector.dimensions`	`1024`	Embedding dimension
`contexa.vectorstore.pgvector.parallel-threads`	`4`	Parallel worker count
`contexa.vectorstore.pgvector.search-timeout-ms`	`10000`	Search timeout
`contexa.vectorstore.pgvector.store-timeout-ms`	`10000`	Store timeout
`contexa.vectorstore.pgvector.document.chunk-size`	`1000`	Document chunk size
`contexa.vectorstore.pgvector.document.chunk-overlap`	`200`	Document chunk overlap