RAG-powered static analysis combined with multi-turn red-teaming simulation to surface vulnerabilities, logic bugs, and missing guardrails in your agent configuration — before reaching production.
From static policy scanning to live adversarial simulation, Dobbies covers the full threat surface of a deployed language model agent.
Paste your public GitHub repo URL — the system automatically detects all agent files, extracts system prompts and tool definitions, and surfaces them as an audit-ready list.
Each agent's system prompt and tool definitions are matched against a local OWASP LLM Top 10 knowledge base using keyword retrieval — flagging injection risks, secret leakage, and over-privileged tools.
A dedicated Attacker LLM sends multi-turn adversarial messages — prompt injection, social engineering, privilege escalation — directly to a TypeScript mock sandbox of your agent.
Produces ready-to-use guardrail configurations — Llama Guard rules, NeMo Guardrails configs, and regex output filters — tailored to the exact vulnerabilities found in your agent.
Orchestration
A robust, 4-stage automated auditing framework designed specifically for AI Agents.
Scan your GitHub repo and detect all agent definitions
Connect your public GitHub repository by pasting its URL. The system calls the GitHub API — authenticated via your GitHub login — to recursively scan all files. It detects agent definitions by matching filename patterns and scanning file content for system prompts, tool schemas, and agent configuration structures. Every detected agent is surfaced as a card in your dashboard, ready to audit.
Match agent config against OWASP LLM Top 10 security rules
The selected agent's system prompt and tool definitions are scanned against a curated OWASP LLM Top 10 knowledge base using keyword and pattern retrieval. Each matched rule raises a finding — flagging issues like hardcoded secrets in system prompts, unrestricted tool permissions, missing output filters, or prompt injection exposure — with a severity level and a concrete remediation step.
Attacker LLM probes the agent in a TypeScript mock sandbox
A specialized Attacker LLM sends adversarial multi-turn messages — social engineering, jailbreak attempts, privilege escalation, and destructive tool invocations — to a TypeScript mock sandbox that mirrors your agent's actual configuration. The sandbox captures every exchange and flags the exact turn where the agent discloses secrets, executes dangerous tool calls, or deviates from its safety constraints.
Score security posture and generate ready-to-use guardrail configs
An Evaluator LLM reviews the full simulation transcript and produces two scores: a static score based on configuration analysis, and a dynamic score based on how many adversarial attacks the agent successfully repelled. Each detected vulnerability is paired with a specific remediation and a ready-to-download guardrail configuration — Llama Guard rules, NeMo Guardrails configs, and regex output filters.
Scope & Impact
We categorize vulnerabilities by their business and protocol impact. Understanding our in-scope boundaries helps you know exactly what Dobbies defends against.
Our auditing framework is exclusively calibrated for the unique attack surfaces of autonomous agents. We actively simulate prompt injections, evaluate the integrity of system instructions, test for unauthorized function calling loops, and detect the leakage of proprietary context from vector databases. If an exploit requires interacting with the agent's logic or orchestration layer, it is strictly within our testing purview.
We do not replicate traditional network scanners. Intrusions targeting underlying server infrastructure, Kubernetes cluster misconfigurations, standard frontend web vulnerabilities, or base model parameter extraction are explicitly excluded. Our focus remains resolutely on the agentic behavior — leaving traditional cloud boundaries to your existing security protocols.
FAQ
Everything you need to know about the product and how it integrates into your workflow.
Dobbies is an automated security auditor for AI Agents. It scans system prompts, tool schemas, and agentic workflows for vulnerabilities, logic bugs, and missing guardrails before they reach production.
Dobbies runs a dual-stage audit: a static vulnerability scan using RAG (retrieving relevant OWASP LLM Top 10 rules) followed by an automated dynamic red-teaming simulation where an adversarial LLM attempts prompt injection and privilege escalation in a secure sandbox.
Absolutely. We prioritize your privacy. Dobbies runs audits in stateless, secure environments. Your configurations, prompts, and tool definitions are only used during the active audit, stored securely in your private history, and never used to train models.
To safely test tool access (like database queries or terminal execution) without risking your infrastructure, Dobbies simulates tools in a mocked TypeScript sandbox. This evaluates if the agent attempts destructive actions under pressure, without actual risk to your systems.
Dobbies is framework-agnostic. You can audit system prompts and tools from LangChain, LlamaIndex, CrewAI, AutoGen, or any custom API-driven agentic architectures by simply pasting your prompts and tool specifications.
Get comprehensive security insights and protect your AI agents from the latest threats.