LLM Fine-Tuning With LoRA (CURRENT)

Fine-tuned an open-source LLM using LoRA adapters on security and infrastructure-focused instructions. The goal was to improve structured remediation guidance and reduce hallucinations on technical workflows.

Model & Dataset

Base model: 7B–8B open-source LLM (Llama-style architecture).
Fine-tuning with PEFT LoRA adapters (QLoRA-style quantization).
Custom dataset of security tickets, scan findings, remediation steps, and cloud infrastructure tasks.
Split into train / validation with held-out eval prompts.

Training Setup

LoRA rank ~32, target modules in attention layers.
Optimizer: AdamW, learning rate in the 2e-4 range.
Trained on a single GPU with gradient accumulation.
Checkpointing and early stopping based on eval loss.

Evaluation

Evaluated using a mix of automatic and manual checks:

Structuredness of response (headings, steps, bullets).
Correctness on known security remediation tasks.
Reduction in obvious hallucinations vs. base model.
Safety checks on disallowed or high-risk content.

Outcome

The fine-tuned model became significantly better at producing stepwise remediation plans and less likely to ignore security constraints. This work forms part of a broader pipeline for Black Halo / HaloX-style AI-assisted security tooling.

View Project on GitHub →