LLM Fine-Tuning With LoRA (CURRENT)
Fine-tuned an open-source LLM using LoRA adapters on security and infrastructure-focused instructions. The goal was to improve structured remediation guidance and reduce hallucinations on technical workflows.
Model & Dataset
- Base model: 7B–8B open-source LLM (Llama-style architecture).
- Fine-tuning with PEFT LoRA adapters (QLoRA-style quantization).
- Custom dataset of security tickets, scan findings, remediation steps, and cloud infrastructure tasks.
- Split into train / validation with held-out eval prompts.
Training Setup
- LoRA rank ~32, target modules in attention layers.
- Optimizer: AdamW, learning rate in the 2e-4 range.
- Trained on a single GPU with gradient accumulation.
- Checkpointing and early stopping based on eval loss.
Evaluation
Evaluated using a mix of automatic and manual checks:
- Structuredness of response (headings, steps, bullets).
- Correctness on known security remediation tasks.
- Reduction in obvious hallucinations vs. base model.
- Safety checks on disallowed or high-risk content.
Outcome
The fine-tuned model became significantly better at producing stepwise remediation plans and less likely to ignore security constraints. This work forms part of a broader pipeline for Black Halo / HaloX-style AI-assisted security tooling.