Markus Dezelak

LLM Fine-Tuning With LoRA (CURRENT)

Fine-tuned an open-source LLM using LoRA adapters on security and infrastructure-focused instructions. The goal was to improve structured remediation guidance and reduce hallucinations on technical workflows.

Model & Dataset

  • Base model: 7B–8B open-source LLM (Llama-style architecture).
  • Fine-tuning with PEFT LoRA adapters (QLoRA-style quantization).
  • Custom dataset of security tickets, scan findings, remediation steps, and cloud infrastructure tasks.
  • Split into train / validation with held-out eval prompts.

Training Setup

  • LoRA rank ~32, target modules in attention layers.
  • Optimizer: AdamW, learning rate in the 2e-4 range.
  • Trained on a single GPU with gradient accumulation.
  • Checkpointing and early stopping based on eval loss.

Evaluation

Evaluated using a mix of automatic and manual checks:

  • Structuredness of response (headings, steps, bullets).
  • Correctness on known security remediation tasks.
  • Reduction in obvious hallucinations vs. base model.
  • Safety checks on disallowed or high-risk content.

Outcome

The fine-tuned model became significantly better at producing stepwise remediation plans and less likely to ignore security constraints. This work forms part of a broader pipeline for Black Halo / HaloX-style AI-assisted security tooling.

View Project on GitHub →