AI Security & LLM Red Teaming

A set of experiments and tooling around LLM security: prompt injection, jailbreak discovery, dangerous content detection, and output hardening. This connects directly with HaloX / Black Halo security workflows.

Techniques Explored

Prompt injection and instruction-hijack tests.
Automated jailbreak search against policy prompts.
Adversarial examples and perturbation-based attacks.
Safety classifier integration and ensemble-style checks.

Integration With HaloX

Many of the ideas here inform how HaloX and Black Halo use AI in their pipelines: from screening generated remediation to scoring findings for risk, to identifying when an LLM is drifting into unsafe or low-confidence territory.