
Most AI systems today rely on:
- prompt engineering
- guardrails at the model level
- post-hoc logging
That works… until it doesn’t.
Once you introduce:
- tools (APIs, DBs)
- RAG pipelines
- multi-step agents
things start breaking in ways that are hard to predict.
So I built something different.
🎥 Demo — Attack → Detection → Decision → Trace
👉 [(https://www.youtube.com/watch?v=OucfJ6_wcTM&t)]
This shows a full flow:
- Attack execution
- Prompt inspection
- Policy enforcement
- Decision (block / allow)
- Full trace in UI
The Problem
While testing LLM-based systems in production-like setups, I kept running into:
- prompt injection bypasses
- unintended tool execution
- data leakage through chaining
- lack of visibility into decisions
The biggest issue wasn’t detection.
It was lack of enforcement.
What I Built
A runtime security system for AI agents.
Not just “guardrails”—but actual enforcement during execution.
Core ideas:
- Treat the LLM as untrusted
- Validate every step
- Control tool access explicitly
- Track everything in real time
Think of it like:
Zero-trust architecture… but for AI systems
How It Works (High-Level)
-
Input Inspection
- Analyze prompt + context
- detect anomalies
-
Policy Enforcement
- allow / block / escalate
- based on structured rules
-
Tool Control
- no free-form execution
- only validated actions
-
Decision Trace
- full visibility into what happened
- why it was allowed or blocked
Testing It with Real Attacks
I used :contentReference[oaicite:0]{index=0} to simulate attacks.
Examples:
- prompt injection
- tool misuse
- data exfiltration attempts
What happens in the system:
- prompt is intercepted
- decoded (if obfuscated)
- evaluated against policies
- decision is enforced
- trace is recorded
What Surprised Me
1. Attacks often look harmless at first
Many inputs don’t look malicious until they interact with tools.
2. Detection alone is not enough
Logging ≠ security. You need runtime control.
3. Explainability is critical
Understanding why something was blocked is just as important as blocking it.
Architecture (Simplified)
- FastAPI backend
- Event pipeline (Kafka-style)
- Policy engine (OPA-style decisions)
- React UI
- Simulation + replay
Try to Break It
If you’re into AI or security:
Try to break the system.
- craft a prompt
- bypass detection
- trigger unintended behavior
If you succeed:
- open an issue
- or submit a PR
We’ll add your attack to the test suite.
Want to Contribute?
GitHub: https://github.com/dshapi/AI-SPM
Good starting points:
- expose attack traces
- improve policy explanations
- strengthen detection
Final Thought
AI systems are becoming:
- more autonomous
- more connected
- more powerful
Which means:
Security can’t be an afterthought.
It has to be part of the runtime.
Curious to hear:
- What attacks would you try?
- Where do you think this breaks?
United States
NORTH AMERICA
Related News
What Does "Building in Public" Actually Mean in 2026?
20h ago
The Agentic Headless Backend: What Vibe Coders Still Need After the UI Is Done
20h ago
Why I’m Still Learning to Code Even With AI
22h ago
Students Boo Commencement Speaker After She Calls AI the 'Next Industrial Revolution'
5h ago

Testing for ‘Bad Cholesterol’ Doesn’t Tell the Whole Story
5h ago
