AI Engine Configuration

Configuration properties for the Contexa AI engine, including the tiered LLM strategy, RAG, advisor chain, streaming pipeline, and tuning scenarios.

Quick Start: Minimal Config

The minimum configuration to run the OSS AI path with the built-in Ollama chat runtime and the tiered security models:

YAML
contexa:
  llm:
    chat:
      ollama:
        base-url: http://localhost:11434
        model: qwen2.5:7b
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
      layer2:
        model: gpt-4o-mini
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1024

This enables the Ollama chat runtime through contexa.llm.chat.ollama.*, assigns Tier 1 and Tier 2 models, and configures the project-declared pgvector properties. If you want a dedicated embedding runtime, add contexa.llm.embedding.ollama.* as shown below.

Configuration Architecture

The AI runtime is configured through seven property groups. Contexa owns the runtime selection, advisor, RAG, and streaming properties, while Spring AI owns the provider integrations and pgvector base configuration.

Property PrefixBound ToConfigures
contexa.llm.*ContexaPropertiesChat runtime selection, Ollama endpoints, model priority, dedicated embedding runtime
spring.ai.security.*TieredLLMPropertiesTier 1 and Tier 2 model selection, backup models, helper timeout accessors
spring.ai.security.tiered.*TieredStrategyPropertiesPrompt budgets, truncation, vector cache, trusted proxy validation, RAG thresholds
contexa.advisor.*ContexaAdvisorPropertiesAdvisor chain profile, security advisor order, SOAR approval advisor settings
contexa.rag.*ContexaRagPropertiesDefault, behavior, risk, AI Lab, and ETL retrieval settings
contexa.streaming.*StreamingPropertiesProtocol markers, retry policy, parser buffers, streaming timeout
spring.ai.vectorstore.pgvector.*PgVectorStorePropertiesIndex type, dimensions, search/store limits, document chunking, HNSW/IVFFLAT tuning

LLM Model Configuration

The tiered model system is split between runtime selection in contexa.llm.* and tier assignment in spring.ai.security.*. The OSS core ships with Ollama chat runtime wiring and can also use Spring AI OpenAI or Anthropic providers if those beans are present.

PropertyDefaultDescription
contexa.llm.enabledtrueMaster flag for Contexa LLM-dependent features
contexa.llm.advisor-enabledtrueEnables the advisor chain registration path
contexa.llm.selection.chat.modeDYNAMIC_PRIORITYChat provider selection strategy: DYNAMIC_PRIORITY walks the priority list, SPRING_PRIMARY uses the Spring @Primary bean
contexa.llm.selection.chat.priority""Comma-separated chat provider order (e.g., ollama,anthropic,openai)
contexa.llm.selection.embedding.modeDYNAMIC_PRIORITYEmbedding provider selection strategy
contexa.llm.selection.embedding.priority""Comma-separated embedding provider order (e.g., ollama,openai)
contexa.llm.chat.ollama.base-url""Required to enable the built-in Ollama chat runtime
contexa.llm.chat.ollama.model""Optional explicit Ollama chat model; if omitted the auto-configuration falls back to qwen2.5:7b
contexa.llm.chat.ollama.keep-alive""Optional Ollama keep-alive hint passed to chat options
contexa.llm.embedding.ollama.dedicated-runtime-enabledfalseEnables a dedicated Ollama embedding runtime instead of reusing the chat runtime
contexa.llm.embedding.ollama.base-url""Required when the dedicated embedding runtime is enabled
contexa.llm.embedding.ollama.model""Optional explicit embedding model; if omitted the auto-configuration falls back to mxbai-embed-large
spring.ai.security.layer1.modelqwen2.5:7bTier 1 model selection
spring.ai.security.layer1.backup.modelTier 1 backup model
spring.ai.security.layer2.modelgpt-4o-miniTier 2 model selection
spring.ai.security.layer2.backup.modelTier 2 backup model
spring.ai.security.tiered.layer1.timeout-ms30000Tier 1 inference timeout used by TieredLLMProperties when the value is unset or invalid
spring.ai.security.tiered.layer2.timeout-ms60000Tier 2 inference timeout used by TieredLLMProperties when the value is unset or invalid
YAML
contexa:
  llm:
    selection:
      chat:
        mode: DYNAMIC_PRIORITY
        priority: ollama,anthropic,openai
      embedding:
        mode: DYNAMIC_PRIORITY
        priority: ollama,openai
    chat:
      ollama:
        base-url: http://localhost:11434
        model: qwen2.5:7b
    embedding:
      ollama:
        dedicated-runtime-enabled: true
        base-url: http://localhost:11435
        model: mxbai-embed-large
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
        backup:
          model: llama3.2:latest
      layer2:
        model: gpt-4o-mini
        backup:
          model: deepseek-r1:14b
      tiered:
        layer1:
          timeout-ms: 30000
        layer2:
          timeout-ms: 60000

Recommended selection (OpenAI + Anthropic with Ollama failover)

The platform can chain managed cloud providers with the local Ollama runtime so that chat calls degrade gracefully when an API key is missing or a cloud endpoint is unreachable. The example below uses OpenAI as the primary chat provider, Anthropic as the second-tier failover, and the local Ollama runtime as the offline / no-key fallback. Embedding always goes through OpenAI.

YAML
contexa:
  llm:
    selection:
      chat:
        mode: dynamic-priority
        priority: openai,anthropic,ollama
      embedding:
        mode: dynamic-priority
        priority: openai

Key meaning

  • mode: dynamic-priority — the orchestrator walks the comma-separated priority list in order and selects the first provider whose ChatModel / EmbeddingModel bean is present and reachable. The runtime form dynamic-priority is the kebab-case spelling of the Mode.DYNAMIC_PRIORITY enum; both forms bind to the same value through Spring's relaxed binding.
  • priority: openai,anthropic,ollama — chat calls try OpenAI first; if the OpenAI client is missing the key or returns an unrecoverable error, the orchestrator falls back to Anthropic; if Anthropic is also unavailable, the call lands on the local Ollama runtime. List order is the failover order, so cheaper / more reliable providers should appear earlier.
  • priority: openai (embedding) — Anthropic does not ship an embedding model in Spring AI, and the embedding path is dimension-pinned to 1536 in the platform's pgvector schema, which matches OpenAI's text-embedding-3-small / text-embedding-ada-002. Mixing in an Ollama embedding model (typically 768 / 1024 dimensions) would write vectors of an incompatible width into the same column and break similarity search. Keep embeddings on a single 1536-dimension provider.

Pair the selection block with the Spring AI provider properties

The contexa.llm.selection.* block decides which provider is called, but the actual API keys and model names live under the standard Spring AI property tree:

YAML
spring:
  ai:
    retry:
      max-attempts: 1
    anthropic:
      api-key: ${ANTHROPIC_API_KEY:disabled}
      chat:
        options:
          model: claude-3-sonnet-20240229
    openai:
      api-key: ${OPENAI_API_KEY:disabled}
      base-url: https://api.openai.com
      chat:
        options:
          model: gpt-4o-mini
          temperature: 0.3
  • API key placeholders — the literal disabled default keeps the auto-configuration from registering a real client when the operator has not supplied an environment variable. Setting ANTHROPIC_API_KEY / OPENAI_API_KEY activates the corresponding provider; leaving them as disabled intentionally skips that branch and lets the next provider in the priority list take over.
  • spring.ai.retry.max-attempts: 1 — with two providers chained via dynamic-priority, retrying the same failing provider inside Spring AI is wasted latency. 1 keeps each call single-shot and lets the CONTEXA fail-over selector pick the next entry.
  • Model identifiersgpt-4o-mini is recommended for Tier 1 (low-latency Layer 1 contextual decisions) and claude-3-sonnet-20240229 for Tier 2 (forensic Layer 2 reasoning). Swap with newer SKUs as they become available.
  • temperature: 0.3 on the OpenAI side is intentional: security-decision prompts benefit from low temperature; raising it widens drift in BLOCK / ESCALATE boundaries.

OpenAI and Anthropic provider credentials are configured through Spring AI's own provider properties such as spring.ai.openai.* and spring.ai.anthropic.*; Contexa only selects between the available provider beans.

See also: LLM & Models Reference

RAG Configuration

These properties drive the retrieval path used by the pipeline and AI Lab. The OSS code binds contexa.rag.* to ContexaRagProperties, and binds pgvector settings to spring.ai.vectorstore.pgvector.*.

PropertyDefaultDescription
contexa.rag.defaults.similarity-threshold0.7Default similarity threshold for general retrieval
contexa.rag.defaults.top-k10Default retrieval size for general retrieval
contexa.rag.behavior.lookback-days30Behavioral retrieval lookback window
contexa.rag.risk.similarity-threshold0.8Risk-focused retrieval threshold
contexa.rag.risk.top-k50Risk-focused retrieval result count
contexa.rag.lab.batch-size50AI Lab batch size
contexa.rag.lab.validation-enabledtrueEnables validation before ingestion
contexa.rag.lab.enrichment-enabledtrueEnables metadata enrichment during AI Lab processing
contexa.rag.lab.top-k100AI Lab retrieval size
contexa.rag.lab.similarity-threshold0.75AI Lab similarity threshold
contexa.rag.etl.batch-size100ETL ingestion batch size
contexa.rag.etl.chunk-size500Chunk size for document splitting
contexa.rag.etl.chunk-overlap50Chunk overlap for document splitting
contexa.rag.etl.vector-table-namevector_storeLogical target table name used by ETL workflows
contexa.rag.etl.behavior.retention-days90Behavioral corpus retention period

Vector Store (pgvector)

PropertyDefaultDescription
spring.ai.vectorstore.pgvector.index-typeHNSWIndex implementation
spring.ai.vectorstore.pgvector.distance-typeCOSINE_DISTANCEDistance metric
spring.ai.vectorstore.pgvector.dimensions1024Embedding dimension bound expected by the store
spring.ai.vectorstore.pgvector.batch-size100Store write batch size
spring.ai.vectorstore.pgvector.parallel-threads4Parallel worker count
spring.ai.vectorstore.pgvector.top-k100Default search result count
spring.ai.vectorstore.pgvector.similarity-threshold0.5Default search similarity threshold
spring.ai.vectorstore.pgvector.search-timeout-ms10000Search timeout
spring.ai.vectorstore.pgvector.store-timeout-ms10000Store timeout
spring.ai.vectorstore.pgvector.hnsw.m16HNSW graph connectivity
spring.ai.vectorstore.pgvector.hnsw.ef-construction64HNSW construction effort
spring.ai.vectorstore.pgvector.hnsw.ef-search100HNSW search effort
spring.ai.vectorstore.pgvector.ivfflat.lists100IVFFLAT list count
spring.ai.vectorstore.pgvector.ivfflat.probes10IVFFLAT probe count
spring.ai.vectorstore.pgvector.document.chunk-size1000Document chunk size
spring.ai.vectorstore.pgvector.document.chunk-overlap200Document chunk overlap
spring.ai.vectorstore.pgvector.document.enrich-metadatatrueMetadata enrichment toggle
spring.ai.vectorstore.pgvector.document.extract-keywordstrueKeyword extraction toggle
spring.ai.vectorstore.pgvector.document.generate-summaryfalseDocument summary generation toggle
YAML
contexa:
  rag:
    defaults:
      similarity-threshold: 0.7
      top-k: 10
    risk:
      similarity-threshold: 0.8
      top-k: 50
    lab:
      batch-size: 50
      validation-enabled: true
      enrichment-enabled: true
spring:
  ai:
    vectorstore:
      pgvector:
        dimensions: 1024
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        document:
          chunk-size: 1000
          chunk-overlap: 200

See also: Pipeline & RAG Reference

Streaming Configuration

PropertyDefaultDescription
contexa.streaming.final-response-marker###FINAL_RESPONSE###Marker emitted before the final structured response
contexa.streaming.streaming-marker###STREAMING###Marker prefix for streaming chunks
contexa.streaming.json-start-marker===JSON_START===Marker indicating JSON output start
contexa.streaming.json-end-marker===JSON_END===Marker indicating JSON output end
contexa.streaming.timeoutPT5MOverall streaming timeout
contexa.streaming.max-retries3Retry count for stream recovery
contexa.streaming.retry-delayPT1SInitial retry delay
contexa.streaming.retry-multiplier1.5Retry backoff multiplier
contexa.streaming.marker-buffer-size100Marker parsing buffer size
contexa.streaming.sentence-buffering-enabledtrueSentence buffering toggle for chunk smoothing
YAML
contexa:
  streaming:
    timeout: 5m
    max-retries: 3
    retry-delay: 1s
    retry-multiplier: 1.5
    marker-buffer-size: 100
    sentence-buffering-enabled: true

See also: Streaming Reference

Advisor Configuration

PropertyDefaultDescription
contexa.advisor.chain-profileSTANDARDNamed advisor chain profile
contexa.advisor.security.enabledtrueEnable the security advisor
contexa.advisor.security.order50Execution order of the security advisor
contexa.advisor.security.require-authenticationfalseRequire an authenticated principal before AI processing
contexa.advisor.soar.approval.enabledtrueEnable the SOAR approval advisor
contexa.advisor.soar.approval.order100Execution order of the SOAR approval advisor
contexa.advisor.soar.approval.timeout300Approval timeout in seconds
YAML
contexa:
  advisor:
    chain-profile: STANDARD
    security:
      enabled: true
      order: 50
      require-authentication: false
    soar:
      approval:
        enabled: true
        order: 100
        timeout: 300

See also: Advisor System Reference

Tuning Scenarios

Common configuration adjustments for specific situations:

Slow Responses

If inference is slow, reduce the model footprint and widen the Layer 1 pipeline budget:

YAML
contexa:
  llm:
    chat:
      ollama:
        model: qwen2.5:7b
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
      tiered:
        layer1:
          timeout:
            total-ms: 5400000
            llm-ms: 3600000
            rag-ms: 1200000

RAG Results Are Inaccurate

If retrieved context is not relevant enough, tighten the default threshold and reduce the retrieval window:

YAML
contexa:
  rag:
    defaults:
      similarity-threshold: 0.85
      top-k: 3
spring:
  ai:
    security:
      tiered:
        layer1:
          rag:
            similarity-threshold: 0.7
          vector-search-limit: 6

Token Usage / Cost Optimization

Reduce prompt size by lowering the Layer 1 prompt and truncation budgets:

YAML
spring:
  ai:
    security:
      tiered:
        layer1:
          prompt:
            max-rag-documents: 2
            max-similar-events: 1
          vector-search-limit: 3
        truncation:
          layer1:
            payload: 100
            rag-document: 150

Production Deployment

Recommended settings for a production deployment with separate chat and embedding runtimes:

YAML
contexa:
  llm:
    selection:
      chat:
        mode: DYNAMIC_PRIORITY
        priority: ollama,anthropic,openai
      embedding:
        mode: DYNAMIC_PRIORITY
        priority: ollama,openai
    embedding:
      ollama:
        dedicated-runtime-enabled: true
        base-url: http://localhost:11435
  advisor:
    security:
      enabled: true
      require-authentication: true
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
      layer2:
        model: gpt-4o-mini
      tiered:
        vector-cache:
          max-size: 50000
          expire-minutes: 10
          enabled: true
          record-stats: true
        security:
          trusted-proxy-validation-enabled: true

Complete Property Reference

TieredStrategyProperties — Layer 1 Settings
PropertyTypeDefaultDescription
.rag.similarity-thresholddouble0.5Layer 1 RAG similarity threshold
.session.max-recent-actionsint100Recent action window
.cache.max-sizeint1000Layer 1 cache size
.cache.ttl-minutesint30Layer 1 cache TTL
.timeout.total-mslong5000Total Layer 1 budget (ms)
.timeout.llm-mslong3200Layer 1 LLM budget (ms)
.timeout.rag-mslong900Layer 1 RAG budget (ms)
.vector-search-limitint3Vector search result cap
.default-budget-profileStringCORTEX_L1_INTERACTIVE_STRICTNamed Layer 1 budget profile
.prompt.max-similar-eventsint2Prompt similar-event cap
.prompt.max-rag-documentsint3Prompt RAG document cap
.prompt.include-event-idbooleanfalseInclude event id in prompt
.prompt.include-raw-timestampbooleanfalseInclude raw timestamp in prompt
.prompt.include-raw-session-idbooleanfalseInclude raw session id in prompt
.prompt.include-full-user-agentbooleanfalseInclude full user agent in prompt
TieredStrategyProperties — Layer 2 and Shared Settings
PropertyTypeDefaultDescription
.layer2.rag.similarity-thresholddouble0.5Layer 2 RAG similarity threshold
.layer2.cache.max-sizeint1000Layer 2 cache size
.layer2.cache.ttl-minutesint30Layer 2 cache TTL
.layer2.timeout-mslong7000Layer 2 timeout (ms)
.layer2.enable-soarbooleanfalseSOAR activation toggle
.layer2.rag-top-kint5Layer 2 retrieval size
.layer2.default-budget-profileStringCORTEX_L2_EXPERT_STRICTNamed Layer 2 budget profile
.truncation.layer1.user-agentint150Layer 1 user-agent truncation
.truncation.layer1.payloadint200Layer 1 payload truncation
.truncation.layer1.rag-documentint180Layer 1 RAG truncation
.truncation.layer2.user-agentint150Layer 2 user-agent truncation
.truncation.layer2.payloadint1000Layer 2 payload truncation
.truncation.layer2.rag-documentint500Layer 2 RAG truncation
.vector-cache.max-sizeint10000Vector cache size
.vector-cache.expire-minutesint5Vector cache TTL
.vector-cache.enabledbooleantrueVector cache toggle
.vector-cache.record-statsbooleantrueVector cache metrics toggle
.security.trusted-proxiesList<String>[]Trusted proxy list
.security.trusted-proxy-validation-enabledbooleantrueTrusted proxy validation toggle
.prompt-compression.enabledbooleantrueRuntime prompt compression toggle
.prompt-runtime.native-structured-output-enabledbooleantrueEnable native structured output
.prompt-runtime.native-structured-output-disabled-profilesList<String>[]Profiles where native structured output is disabled
.prompt-runtime.telemetry-enabledbooleantrueEnable prompt runtime telemetry
SecurityMappingProperties — spring.ai.security.mapping
PropertyTypeDefaultDescription
.task-to-tierMap<String, Integer>{}Direct task name → tier (1/2/3) mapping
.task-to-analysis-levelMap<String, String>{}Task name → analysis level (QUICK/NORMAL/DEEP) mapping
.task-configsMap<String, TaskConfig>{}Per-task detailed config (tier, analysisLevel, toolExecutionEnabled, requireFastResponse, preferLocalModel/CloudModel, temperature, timeoutMs, preferredModel, metadata)
.defaults.tier1-tasksString[][THREAT_FILTERING, QUICK_DETECTION]Default tier-1 task names
.defaults.tier2-tasksString[][CONTEXTUAL_ANALYSIS, BEHAVIOR_ANALYSIS, CORRELATION]Default tier-2 task names
.defaults.tier3-tasksString[][EXPERT_INVESTIGATION, INCIDENT_RESPONSE, FORENSIC_ANALYSIS, SOAR_AUTOMATION, APPROVAL_WORKFLOW]Default tier-3 task names
.defaults.default-tierInteger2Default tier for unmapped tasks
.defaults.default-analysis-levelStringNORMALDefault analysis level for unmapped tasks
ContexaLlmBindingProperties — contexa.llm.bindings
PropertyTypeDefaultDescription
.chatMap<String, Binding>{}Chat model bindings (key = logical name)
.embeddingMap<String, Binding>{}Embedding model bindings (key = logical name)
Binding fields
.bean-nameString""Spring AI bean name to reference
.providerString""Provider identifier (ollama/openai/anthropic, etc.)
.model-idString""Model identifier (e.g., qwen2.5:7b)
.aliasesList<String>[]Model aliases
.enabledbooleantrueBinding enabled flag
.primarybooleanfalsePrimary binding for this kind
YAML
contexa:
  llm:
    bindings:
      chat:
        primary-ollama:
          bean-name: contexaOllamaChatModel
          provider: ollama
          model-id: qwen2.5:7b
          aliases: [default, fast]
          enabled: true
          primary: true
      embedding:
        primary-ollama:
          bean-name: contexaSharedOllamaEmbeddingModel
          provider: ollama
          model-id: mxbai-embed-large
          enabled: true
          primary: true
Contexa RAG, Advisor, Streaming, and PgVector
PropertyDefaultDescription
contexa.rag.behavior.lookback-days30Behavioral retrieval lookback window
contexa.rag.risk.similarity-threshold0.8Risk retrieval threshold
contexa.rag.risk.top-k50Risk retrieval size
contexa.rag.etl.batch-size100ETL batch size
contexa.rag.etl.chunk-size500ETL chunk size
contexa.rag.etl.chunk-overlap50ETL chunk overlap
contexa.rag.etl.behavior.retention-days90Behavior retention period
contexa.advisor.chain-profileSTANDARDAdvisor chain profile
contexa.advisor.soar.approval.enabledtrueSOAR approval advisor toggle
contexa.advisor.soar.approval.order100SOAR approval advisor order
contexa.advisor.soar.approval.timeout300SOAR approval timeout
contexa.streaming.timeoutPT5MStreaming timeout
contexa.streaming.max-retries3Streaming retry count
contexa.streaming.retry-delayPT1SStreaming retry delay
contexa.streaming.retry-multiplier1.5Streaming retry multiplier
spring.ai.vectorstore.pgvector.dimensions1024Embedding dimension
spring.ai.vectorstore.pgvector.parallel-threads4Parallel worker count
spring.ai.vectorstore.pgvector.search-timeout-ms10000Search timeout
spring.ai.vectorstore.pgvector.store-timeout-ms10000Store timeout
spring.ai.vectorstore.pgvector.document.chunk-size1000Document chunk size
spring.ai.vectorstore.pgvector.document.chunk-overlap200Document chunk overlap