Markus Dezelak

Production-Grade RAG System

Retrieval-augmented generation (RAG) pipeline designed for real documentation: security reports, infrastructure runbooks, and architecture diagrams. The focus is reliability, debuggability, and evals.

Architecture

  • Document ingestion from files and APIs.
  • Chunking tuned for task: by heading, by semantic similarity, or fixed size.
  • Vector store using FAISS / Chroma-like backends.
  • Embeddings from OpenAI / Claude-style models.
  • Top-k and MMR style retrieval for context selection.

Evaluation & Metrics

  • Context hit rate on ground-truth QA pairs.
  • Answer correctness judged by a separate LLM grader.
  • Failure case logging: missing context, irrelevant chunks, hallucinations.

Usage

The RAG stack is designed to slot into AI assistants like Black Halo or SOC-style copilots, giving them grounded context for their reasoning while preserving traceability back to original documents.