AI Engine Configuration

Configuration properties for the Contexa AI engine, including the tiered LLM strategy, RAG, advisor chain, streaming pipeline, and tuning scenarios.

Quick Start: Minimal Config

The minimum configuration to run the OSS AI path with the built-in Ollama chat runtime and the tiered security models:

YAML
contexa:
  llm:
    chat:
      ollama:
        base-url: http://localhost:11434
        model: qwen2.5:14b
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:14b
      layer2:
        model: exaone3.5:latest
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1024

This enables the Ollama chat runtime through contexa.llm.chat.ollama.*, assigns Tier 1 and Tier 2 models, and configures the project-declared pgvector properties. If you want a dedicated embedding runtime, add contexa.llm.embedding.ollama.* as shown below.

Configuration Architecture

The AI runtime is configured through seven property groups. Contexa owns the runtime selection, advisor, RAG, and streaming properties, while Spring AI owns the provider integrations and pgvector base configuration.

Property PrefixBound ToConfigures
contexa.llm.*ContexaPropertiesChat runtime selection, Ollama endpoints, model priority, dedicated embedding runtime
spring.ai.security.*TieredLLMPropertiesTier 1 and Tier 2 model selection, backup models, helper timeout accessors
spring.ai.security.tiered.*TieredStrategyPropertiesPrompt budgets, truncation, vector cache, trusted proxy validation, RAG thresholds
contexa.advisor.*ContexaAdvisorPropertiesAdvisor chain profile, security advisor order, SOAR approval advisor settings
contexa.rag.*ContexaRagPropertiesDefault, behavior, risk, AI Lab, and ETL retrieval settings
contexa.streaming.*StreamingPropertiesProtocol markers, retry policy, parser buffers, streaming timeout
spring.ai.vectorstore.pgvector.*PgVectorStorePropertiesIndex type, dimensions, search/store limits, document chunking, HNSW/IVFFLAT tuning

LLM Model Configuration

The tiered model system is split between runtime selection in contexa.llm.* and tier assignment in spring.ai.security.*. The OSS core ships with Ollama chat runtime wiring and can also use Spring AI OpenAI or Anthropic providers if those beans are present.

PropertyDefaultDescription
contexa.llm.enabledtrueMaster flag for Contexa LLM-dependent features
contexa.llm.advisor-enabledtrueEnables the advisor chain registration path
contexa.llm.chat-model-priorityollama,anthropic,openaiProvider preference order for the primary chat model
contexa.llm.embedding-model-priorityollama,openaiProvider preference order for the primary embedding model
contexa.llm.chat.ollama.base-url""Required to enable the built-in Ollama chat runtime
contexa.llm.chat.ollama.model""Optional explicit Ollama chat model; if omitted the auto-configuration falls back to qwen3:8b
contexa.llm.chat.ollama.keep-alive""Optional Ollama keep-alive hint passed to chat options
contexa.llm.embedding.ollama.dedicated-runtime-enabledfalseEnables a dedicated Ollama embedding runtime instead of reusing the chat runtime
contexa.llm.embedding.ollama.base-url""Required when the dedicated embedding runtime is enabled
contexa.llm.embedding.ollama.model""Optional explicit embedding model; if omitted the auto-configuration falls back to mxbai-embed-large
spring.ai.security.layer1.modelqwen2.5:14bTier 1 model selection
spring.ai.security.layer1.backup.modelTier 1 backup model
spring.ai.security.layer2.modelexaone3.5:latestTier 2 model selection
spring.ai.security.layer2.backup.modelTier 2 backup model
spring.ai.security.tiered.layer1.timeout-ms30000Tier 1 inference timeout used by TieredLLMProperties when the value is unset or invalid
spring.ai.security.tiered.layer2.timeout-ms60000Tier 2 inference timeout used by TieredLLMProperties when the value is unset or invalid
YAML
contexa:
  llm:
    chat-model-priority: ollama,anthropic,openai
    embedding-model-priority: ollama,openai
    chat:
      ollama:
        base-url: http://localhost:11434
        model: qwen2.5:14b
    embedding:
      ollama:
        dedicated-runtime-enabled: true
        base-url: http://localhost:11435
        model: mxbai-embed-large
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:14b
        backup:
          model: llama3.2:latest
      layer2:
        model: exaone3.5:latest
        backup:
          model: deepseek-r1:14b
      tiered:
        layer1:
          timeout-ms: 30000
        layer2:
          timeout-ms: 60000

OpenAI and Anthropic provider credentials are configured through Spring AI's own provider properties such as spring.ai.openai.* and spring.ai.anthropic.*; Contexa only selects between the available provider beans.

See also: LLM & Models Reference

RAG Configuration

These properties drive the retrieval path used by the pipeline and AI Lab. The OSS code binds contexa.rag.* to ContexaRagProperties, and binds pgvector settings to spring.ai.vectorstore.pgvector.*.

PropertyDefaultDescription
contexa.rag.defaults.similarity-threshold0.7Default similarity threshold for general retrieval
contexa.rag.defaults.top-k10Default retrieval size for general retrieval
contexa.rag.behavior.lookback-days30Behavioral retrieval lookback window
contexa.rag.risk.similarity-threshold0.8Risk-focused retrieval threshold
contexa.rag.risk.top-k50Risk-focused retrieval result count
contexa.rag.lab.batch-size50AI Lab batch size
contexa.rag.lab.validation-enabledtrueEnables validation before ingestion
contexa.rag.lab.enrichment-enabledtrueEnables metadata enrichment during AI Lab processing
contexa.rag.lab.top-k100AI Lab retrieval size
contexa.rag.lab.similarity-threshold0.75AI Lab similarity threshold
contexa.rag.etl.batch-size100ETL ingestion batch size
contexa.rag.etl.chunk-size500Chunk size for document splitting
contexa.rag.etl.chunk-overlap50Chunk overlap for document splitting
contexa.rag.etl.vector-table-namevector_storeLogical target table name used by ETL workflows
contexa.rag.etl.behavior.retention-days90Behavioral corpus retention period

Vector Store (pgvector)

PropertyDefaultDescription
spring.ai.vectorstore.pgvector.index-typeHNSWIndex implementation
spring.ai.vectorstore.pgvector.distance-typeCOSINE_DISTANCEDistance metric
spring.ai.vectorstore.pgvector.dimensions1024Embedding dimension bound expected by the store
spring.ai.vectorstore.pgvector.batch-size100Store write batch size
spring.ai.vectorstore.pgvector.parallel-threads4Parallel worker count
spring.ai.vectorstore.pgvector.top-k100Default search result count
spring.ai.vectorstore.pgvector.similarity-threshold0.5Default search similarity threshold
spring.ai.vectorstore.pgvector.search-timeout-ms10000Search timeout
spring.ai.vectorstore.pgvector.store-timeout-ms10000Store timeout
spring.ai.vectorstore.pgvector.hnsw.m16HNSW graph connectivity
spring.ai.vectorstore.pgvector.hnsw.ef-construction64HNSW construction effort
spring.ai.vectorstore.pgvector.hnsw.ef-search100HNSW search effort
spring.ai.vectorstore.pgvector.ivfflat.lists100IVFFLAT list count
spring.ai.vectorstore.pgvector.ivfflat.probes10IVFFLAT probe count
spring.ai.vectorstore.pgvector.document.chunk-size1000Document chunk size
spring.ai.vectorstore.pgvector.document.chunk-overlap200Document chunk overlap
spring.ai.vectorstore.pgvector.document.enrich-metadatatrueMetadata enrichment toggle
spring.ai.vectorstore.pgvector.document.extract-keywordstrueKeyword extraction toggle
spring.ai.vectorstore.pgvector.document.generate-summaryfalseDocument summary generation toggle
YAML
contexa:
  rag:
    defaults:
      similarity-threshold: 0.7
      top-k: 10
    risk:
      similarity-threshold: 0.8
      top-k: 50
    lab:
      batch-size: 50
      validation-enabled: true
      enrichment-enabled: true
spring:
  ai:
    vectorstore:
      pgvector:
        dimensions: 1024
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        document:
          chunk-size: 1000
          chunk-overlap: 200

See also: Pipeline & RAG Reference

Streaming Configuration

PropertyDefaultDescription
contexa.streaming.final-response-marker###FINAL_RESPONSE###Marker emitted before the final structured response
contexa.streaming.streaming-marker###STREAMING###Marker prefix for streaming chunks
contexa.streaming.json-start-marker===JSON_START===Marker indicating JSON output start
contexa.streaming.json-end-marker===JSON_END===Marker indicating JSON output end
contexa.streaming.timeoutPT5MOverall streaming timeout
contexa.streaming.max-retries3Retry count for stream recovery
contexa.streaming.retry-delayPT1SInitial retry delay
contexa.streaming.retry-multiplier1.5Retry backoff multiplier
contexa.streaming.marker-buffer-size100Marker parsing buffer size
contexa.streaming.sentence-buffering-enabledtrueSentence buffering toggle for chunk smoothing
YAML
contexa:
  streaming:
    timeout: 5m
    max-retries: 3
    retry-delay: 1s
    retry-multiplier: 1.5
    marker-buffer-size: 100
    sentence-buffering-enabled: true

See also: Streaming Reference

Advisor Configuration

PropertyDefaultDescription
contexa.advisor.chain-profileSTANDARDNamed advisor chain profile
contexa.advisor.security.enabledtrueEnable the security advisor
contexa.advisor.security.order50Execution order of the security advisor
contexa.advisor.security.require-authenticationfalseRequire an authenticated principal before AI processing
contexa.advisor.soar.approval.enabledtrueEnable the SOAR approval advisor
contexa.advisor.soar.approval.order100Execution order of the SOAR approval advisor
contexa.advisor.soar.approval.timeout300Approval timeout in seconds
YAML
contexa:
  advisor:
    chain-profile: STANDARD
    security:
      enabled: true
      order: 50
      require-authentication: false
    soar:
      approval:
        enabled: true
        order: 100
        timeout: 300

See also: Advisor System Reference

Tuning Scenarios

Common configuration adjustments for specific situations:

Slow Responses

If inference is slow, reduce the model footprint and widen the Layer 1 pipeline budget:

YAML
contexa:
  llm:
    chat:
      ollama:
        model: qwen2.5:7b
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:7b
      tiered:
        layer1:
          timeout:
            total-ms: 5400000
            llm-ms: 3600000
            rag-ms: 1200000

RAG Results Are Inaccurate

If retrieved context is not relevant enough, tighten the default threshold and reduce the retrieval window:

YAML
contexa:
  rag:
    defaults:
      similarity-threshold: 0.85
      top-k: 3
spring:
  ai:
    security:
      tiered:
        layer1:
          rag:
            similarity-threshold: 0.7
          vector-search-limit: 6

Token Usage / Cost Optimization

Reduce prompt size by lowering the Layer 1 prompt and truncation budgets:

YAML
spring:
  ai:
    security:
      tiered:
        layer1:
          prompt:
            max-rag-documents: 2
            max-similar-events: 1
          vector-search-limit: 3
        truncation:
          layer1:
            payload: 100
            rag-document: 150

Production Deployment

Recommended settings for a production deployment with separate chat and embedding runtimes:

YAML
contexa:
  llm:
    chat-model-priority: ollama,anthropic,openai
    embedding-model-priority: ollama,openai
    embedding:
      ollama:
        dedicated-runtime-enabled: true
        base-url: http://localhost:11435
  advisor:
    security:
      enabled: true
      require-authentication: true
spring:
  ai:
    security:
      layer1:
        model: qwen2.5:14b
      layer2:
        model: exaone3.5:latest
      tiered:
        vector-cache:
          max-size: 50000
          expire-minutes: 10
          enabled: true
          record-stats: true
        security:
          trusted-proxy-validation-enabled: true

Complete Property Reference

TieredStrategyProperties — Layer 1 Settings
PropertyTypeDefaultDescription
.rag.similarity-thresholddouble0.5Layer 1 RAG similarity threshold
.session.max-recent-actionsint100Recent action window
.cache.max-sizeint1000Layer 1 cache size
.cache.ttl-minutesint30Layer 1 cache TTL
.timeout.total-mslong4500000Total Layer 1 budget
.timeout.llm-mslong3000000Layer 1 LLM budget
.timeout.rag-mslong1000000Layer 1 RAG budget
.vector-search-limitint12Vector search result cap
.default-budget-profileStringCORTEX_L1_STANDARDNamed Layer 1 budget profile
.prompt.max-similar-eventsint3Prompt similar-event cap
.prompt.max-rag-documentsint12Prompt RAG document cap
.prompt.include-event-idbooleanfalseInclude event id in prompt
.prompt.include-raw-timestampbooleanfalseInclude raw timestamp in prompt
.prompt.include-raw-session-idbooleanfalseInclude raw session id in prompt
.prompt.include-full-user-agentbooleanfalseInclude full user agent in prompt
TieredStrategyProperties — Layer 2 and Shared Settings
PropertyTypeDefaultDescription
.layer2.rag.similarity-thresholddouble0.5Layer 2 RAG similarity threshold
.layer2.cache.max-sizeint1000Layer 2 cache size
.layer2.cache.ttl-minutesint30Layer 2 cache TTL
.layer2.timeout-mslong100000Layer 2 timeout
.layer2.enable-soarbooleanfalseSOAR activation toggle
.layer2.rag-top-kint10Layer 2 retrieval size
.layer2.default-budget-profileStringCORTEX_L2_STANDARDNamed Layer 2 budget profile
.truncation.layer1.user-agentint150Layer 1 user-agent truncation
.truncation.layer1.payloadint200Layer 1 payload truncation
.truncation.layer1.rag-documentint300Layer 1 RAG truncation
.truncation.layer2.user-agentint150Layer 2 user-agent truncation
.truncation.layer2.payloadint1000Layer 2 payload truncation
.truncation.layer2.rag-documentint500Layer 2 RAG truncation
.vector-cache.max-sizeint10000Vector cache size
.vector-cache.expire-minutesint5Vector cache TTL
.vector-cache.enabledbooleantrueVector cache toggle
.vector-cache.record-statsbooleantrueVector cache metrics toggle
.security.trusted-proxiesList<String>[]Trusted proxy list
.security.trusted-proxy-validation-enabledbooleantrueTrusted proxy validation toggle
.prompt-compression.enabledbooleantrueRuntime prompt compression toggle
Contexa RAG, Advisor, Streaming, and PgVector
PropertyDefaultDescription
contexa.rag.behavior.lookback-days30Behavioral retrieval lookback window
contexa.rag.risk.similarity-threshold0.8Risk retrieval threshold
contexa.rag.risk.top-k50Risk retrieval size
contexa.rag.etl.batch-size100ETL batch size
contexa.rag.etl.chunk-size500ETL chunk size
contexa.rag.etl.chunk-overlap50ETL chunk overlap
contexa.rag.etl.behavior.retention-days90Behavior retention period
contexa.advisor.chain-profileSTANDARDAdvisor chain profile
contexa.advisor.soar.approval.enabledtrueSOAR approval advisor toggle
contexa.advisor.soar.approval.order100SOAR approval advisor order
contexa.advisor.soar.approval.timeout300SOAR approval timeout
contexa.streaming.timeoutPT5MStreaming timeout
contexa.streaming.max-retries3Streaming retry count
contexa.streaming.retry-delayPT1SStreaming retry delay
contexa.streaming.retry-multiplier1.5Streaming retry multiplier
spring.ai.vectorstore.pgvector.dimensions1024Embedding dimension
spring.ai.vectorstore.pgvector.parallel-threads4Parallel worker count
spring.ai.vectorstore.pgvector.search-timeout-ms10000Search timeout
spring.ai.vectorstore.pgvector.store-timeout-ms10000Store timeout
spring.ai.vectorstore.pgvector.document.chunk-size1000Document chunk size
spring.ai.vectorstore.pgvector.document.chunk-overlap200Document chunk overlap