RAG & Vector Operations

Overview

Vector operations power the CONTEXT_RETRIEVAL step of the AI pipeline. Documents are stored with metadata, embedded via Spring AI's embedding model, and persisted through the configured Spring AI VectorStore. When PgVector is configured, the same contracts operate on top of PgVector storage. Similarity searches retrieve relevant context before LLM execution.

The architecture has two layers:

UnifiedVectorService — General-purpose vector operations with cache integration. Used for standard document storage and retrieval.
AbstractVectorLabService — Base class for domain-specific vector services. Adds lab-specific metadata enrichment, validation, metrics tracking, and filter expressions. BehaviorVectorService is the primary implementation for behavioral analysis data.

VectorOperations

The core interface for all vector store interactions.

public interface VectorOperations

storeDocument(Document document) void

Stores a single document with its metadata and text content into the vector store.

storeDocuments(List<Document> documents) void

Batch stores multiple documents. Uses configurable batch sizes for large document sets.

searchSimilar(String query) List<Document>

Searches for documents similar to the query text using default topK and similarity threshold settings.

searchSimilar(String query, Map<String, Object> filters) List<Document>

Searches with metadata filters. Filter keys are matched with equality operators, lists use IN, and maps with "gte"/"lte" keys are treated as range filters.

searchSimilar(SearchRequest searchRequest) List<Document>

Advanced search with full control over topK, similarity threshold, and filter expressions via Spring AI's SearchRequest.

deleteDocuments(List<String> documentIds) void

Deletes documents by their IDs from the vector store.

UnifiedVectorService

General-purpose implementation with cache integration through VectorStoreCacheLayer.

public class UnifiedVectorService implements VectorOperations

Key Behaviors

Enriches documents with auto-generated id, timestamp, documentType, and version metadata.
Uses VectorStoreCacheLayer for search result caching. Cache is invalidated on write operations.
Batch inserts use the configured batchSize from PgVectorStoreProperties.
All write operations are @Transactional.

AbstractVectorLabService

Base class for domain-specific vector services that adds lab-aware processing, metrics, enrichment, and validation.

public abstract class AbstractVectorLabService implements VectorOperations

Abstract Methods (Must Implement)

getLabName() String

Returns the lab name used for metrics, logging, and metadata tagging.

getDocumentType() String

Returns the document type identifier stored in metadata.

enrichLabSpecificMetadata(Document document) Document

Enriches the document with domain-specific metadata. Called during preprocessing when enrichment is enabled.

validateLabSpecificDocument(Document document) void

Validates the document against domain-specific rules. Called during preprocessing when validation is enabled.

Metrics Integration

When a VectorStoreMetrics bean is available, all operations (STORE, SEARCH, DELETE) are automatically tracked with operation counts, durations, and error rates per lab.

BehaviorVectorService

Specialized vector service for storing and querying user behavioral analysis data. Used by the HCAD system for anomaly detection.

public class BehaviorVectorService extends AbstractVectorLabService

storeBehavior(BehavioralAnalysisContext context) void

Stores a user behavior pattern including user ID, IP address, session data, user agent, and activity details.

storeThreatPattern(BehavioralAnalysisContext context, BehavioralAnalysisResponse response) void

Stores a detected threat pattern for future similarity matching.

storeAnalysisResult(BehavioralAnalysisContext context, BehavioralAnalysisResponse response) void

Stores the result of a behavioral analysis for historical reference.

findSimilarBehaviors(String userId, String ip, String path, int topK) List<Document>

Finds behavior patterns similar to the given parameters. Filters by userId and lab name.

Metadata Enrichment

Automatically adds temporal metadata (hour, dayOfWeek, isWeekend) and network metadata (networkSegment extracted from IP address) to each stored document.

Configuration

YAML

spring:
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 384

contexa:
  rag:
    lab:
      batch-size: 50
      top-k: 100
      similarity-threshold: 0.75
      validation-enabled: true
      enrichment-enabled: true

Code Examples

Storing and Searching Documents

Java

// Store a security policy document
Document policyDoc = new Document(
        "Users must use MFA for admin operations",
        Map.of("documentType", "POLICY", "category", "auth"));

unifiedVectorService.storeDocument(policyDoc);

// Search for relevant context
List<Document> context = unifiedVectorService.searchSimilar(
        "multi-factor authentication requirements",
        Map.of("category", "auth"));

Behavior Pattern Storage