Last month, as a security researcher I ran an AI coding agent on a legacy monorepo. The agent was asked to refactor a database connection module. Somewhere in its multi-step reasoning, it decided the .env file contained “outdated configuration” and helpfully deleted it. Then it rewrote the database module to use hardcoded connection strings it hallucinated from the codebase’s test fixtures. The CI pipeline caught it. Production was fine. But I spent two hours recovering the environment file from a backup that was three weeks stale.
Nobody was attacked. Nobody was malicious. The agent was doing exactly what agents do — taking autonomous action toward a goal. The problem was not the agent’s intent. The problem was that nothing stopped it.
This is the new reality: AI coding tools are not autocomplete anymore. They are agents with filesystem access, shell execution, and network reach. If you are using them without sandboxing, you are running untrusted code with your credentials. You just happen to trust the vendor.
The New Attack Surface
The shift from AI-assisted code completion to AI-powered code agents happened gradually, then all at once. Tools like Claude Code, Cursor, Aider, and GitHub Copilot Workspace have moved from suggesting the next line to executing multi-step plans that read files, write files, run commands, install packages, and make network requests.
This is a fundamentally different security posture than anything we have dealt with in developer tooling.
A traditional IDE extension reads your code and suggests completions. The blast radius of a failure is a bad suggestion you can ignore. An agentic coding tool reads your code, reasons about it, modifies it, runs tests, installs dependencies, executes shell commands, and iterates until the task is done. The blast radius of a failure is anything the agent can touch — which, in most default configurations, is everything your user account can touch.
Consider the permission spectrum:
┌────────────────────────────────────────────────────────────────────┐
│ AI Coding Tool Permission Spectrum │
├──────────────┬──────────────┬──────────────┬───────────────────────┤
│ Autocomplete │ Read-Only │ Supervised │ Fully Autonomous │
│ (Copilot) │ Analysis │ Agent │ Agent │
├──────────────┼──────────────┼──────────────┼───────────────────────┤
│ Suggests code │ Reads files │ Reads/writes │ Reads/writes files │
│ in editor │ for context │ files with │ Executes commands │
│ │ │ approval │ Installs packages │
│ No file │ No file │ Runs commands│ Makes network calls │
│ access │ modification │ with approval│ Iterates on errors │
│ │ │ │ All without approval │
├──────────────┼──────────────┼──────────────┼───────────────────────┤
│ Blast radius: │ Blast radius:│ Blast radius:│ Blast radius: │
│ None │ None │ Per-action │ Everything your │
│ │ │ │ user can access │
└──────────────┴──────────────┴──────────────┴───────────────────────┘
Most developers, in pursuit of velocity, operate at the right end of this spectrum. They accept every permission prompt, or they configure --dangerously-skip-permissions because the confirmation dialogs break their flow. I understand the impulse. I have done it myself. But if you are running an autonomous agent with your SSH keys, your AWS credentials, your production kubeconfig, and write access to your entire home directory — you are not using a coding tool. You are giving a junior contractor root on your workstation and leaving the room.
What Can Go Wrong
The threat model for AI coding agents breaks into two categories: accidental damage and adversarial exploitation.
Accidental Damage
This is the most common failure mode and the one most developers underestimate. Agents are optimizers — they pursue the goal you gave them with the tools available. They do not have your intuition about what is sacred and what is expendable.
Real scenarios I have seen or been told about:
- Credential exposure. Agent reads
.envfile for “context,” includes API keys in a commit message or code comment during refactoring. - Destructive operations. Agent runs
git reset --hardorrm -rf node_modulesas a “cleanup” step, destroying uncommitted work. - Dependency poisoning. Agent installs a package to solve a problem, does not verify the package’s provenance, introduces a supply chain risk.
- Configuration drift. Agent modifies database config, CI pipeline, or infrastructure-as-code to “fix” an issue, creating a divergence from the known-good state.
- Runaway execution. Agent enters a retry loop on a failing test, consuming API credits or hammering a rate-limited service.
Adversarial Exploitation
I wrote about this in depth in my OpenClaw article, but the short version is: prompt injection is not theoretical anymore. When your agent reads untrusted content — READMEs from npm packages, issue descriptions from GitHub, code comments in a dependency — that content can contain instructions that hijack the agent’s behavior.
The attack surface is particularly insidious because it is invisible. A malicious actor does not need to compromise your machine. They need to get adversarial text into something your agent will read.
┌─────────────────────────────────────────────────────────┐
│ Prompt Injection Attack Flow │
│ │
│ Attacker plants instructions in: │
│ ┌──────────────┐ │
│ │ npm README │──┐ │
│ │ GitHub Issue │──┤ │
│ │ Code Comment │──┼──▶ Agent reads file/page │
│ │ PR Description│──┤ │ │
│ │ Stack Overflow│──┘ ▼ │
│ └──────────────┘ Agent treats content as │
│ instructions │
│ │ │
│ ▼ │
│ Agent executes: │
│ • Exfiltrate .env │
│ • Modify source code │
│ • Install backdoor │
│ • Curl to attacker endpoint │
└─────────────────────────────────────────────────────────┘
The combination of accidental and adversarial risks creates a risk matrix that should concern anyone running these tools in an uncontrolled environment:
| Risk Scenario | Likelihood | Impact | Default Mitigation |
|---|---|---|---|
| Agent deletes/overwrites files | High | Medium | Git history, but uncommitted work is lost |
| Agent exposes credentials in output | Medium | High | None in most default configs |
| Prompt injection via dependency | Low-Medium | Critical | None — agents read everything |
| Agent installs malicious package | Low | Critical | None — agents run install commands |
| Runaway API costs from retry loops | Medium | Medium | Some tools have cost limits |
| Agent modifies CI/CD pipeline | Low | High | Requires code review before merge |
Defense in Depth: Sandboxing Strategies
Security is layers. No single control stops everything. The goal is to make each failure mode encounter at least two independent barriers before it causes real damage.
┌─────────────────────────────────────────────────────────────────┐
│ Defense in Depth Layers │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Layer 5: Monitoring & Audit │ │
│ │ Action logging, diff review, cost alerts │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ Layer 4: Credential Isolation │ │ │
│ │ │ Vault injection, short-lived tokens, no creds │ │ │
│ │ │ in project tree │ │ │
│ │ │ ┌───────────────────────────────────────────────┐ │ │ │
│ │ │ │ Layer 3: Network Isolation │ │ │ │
│ │ │ │ Outbound restrictions, proxy controls │ │ │ │
│ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ │
│ │ │ │ │ Layer 2: Filesystem Isolation │ │ │ │ │
│ │ │ │ │ Containers, worktrees, VMs │ │ │ │ │
│ │ │ │ │ ┌───────────────────────────────────┐ │ │ │ │ │
│ │ │ │ │ │ Layer 1: Tool Permission Modes │ │ │ │ │ │
│ │ │ │ │ │ Allowlists, denylists, approval │ │ │ │ │ │
│ │ │ │ │ │ gates │ │ │ │ │ │
│ │ │ │ │ └───────────────────────────────────┘ │ │ │ │ │
│ │ │ │ └─────────────────────────────────────────┘ │ │ │ │
│ │ │ └───────────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Layer 1: Tool Permission Modes
Every major AI coding tool now ships with some form of permission control. The problem is that the defaults are either too permissive or too interruptive, so developers disable them.
The right approach is not to choose between security and flow. It is to configure permissions that match your actual risk tolerance.
Claude Code provides the most granular model I have seen. Its permission system supports:
- Allowlisted commands: Specify exactly which shell commands the agent can run without approval (
git status,npm test,go build). - Denylisted paths: Prevent the agent from reading or writing specific files or directories (
~/.ssh,~/.aws,.env). - Permission modes: Choose between full approval, auto-approve reads, or bypass all checks — each a deliberate trade-off.
A practical configuration looks like this:
{
"permissions": {
"allow": [
"Bash(git status)",
"Bash(git diff*)",
"Bash(go build*)",
"Bash(go test*)",
"Bash(npm test)",
"Bash(npm run lint)",
"Read(*)",
"Write(src/**)"
],
"deny": [
"Read(~/.ssh/**)",
"Read(~/.aws/**)",
"Read(~/.kube/**)",
"Read(**/.env*)",
"Write(~/**)",
"Bash(rm -rf*)",
"Bash(git push*)",
"Bash(git reset --hard*)",
"Bash(curl*)",
"Bash(wget*)"
]
}
}
This configuration lets the agent do its job — read code, write to the source directory, run builds and tests — while blocking the operations that cause real damage: reading credentials, writing outside the project, deleting files, pushing code, or making network requests.
The key insight is to deny destructive and exfiltration-capable operations by default, then allowlist the specific safe operations you need. This is the principle of least privilege applied to your AI coding tool. It is the same principle you apply to IAM roles, service accounts, and container security policies. The agent is a principal. Treat it like one.
Layer 2: Filesystem Isolation
Permission modes control what the agent is allowed to do. Filesystem isolation controls what the agent can see.
Git worktrees are the lightest-weight option. A worktree creates a separate working copy of your repository at a different path, linked to the same .git directory. The agent operates in the worktree. Your main working directory is untouched. If the agent makes a mess, you delete the worktree.
# Create an isolated worktree for agent work
git worktree add ../project-agent-sandbox feature/ai-refactor
# Point the agent at the worktree
cd ../project-agent-sandbox
# When done, review changes and merge or discard
cd ../project
git worktree remove ../project-agent-sandbox
Worktrees protect your uncommitted work and give you a clean review boundary, but they share the same filesystem — the agent can still read ~/.ssh if permissions allow it.
Docker containers provide real isolation. The agent runs inside a container with only the project directory mounted. No access to your home directory, no access to your credentials, no access to your host network unless you explicitly grant it.
FROM node:20-slim
# Create a non-root user for the agent
RUN useradd -m -s /bin/bash agent
WORKDIR /workspace
# Install only the tools the agent needs
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
USER agent
# docker-compose.yml for sandboxed AI coding
services:
ai-sandbox:
build: .
volumes:
- ./src:/workspace/src
- ./tests:/workspace/tests
- ./package.json:/workspace/package.json:ro
- ./tsconfig.json:/workspace/tsconfig.json:ro
# No home directory mount
# No SSH keys
# No cloud credentials
# No Docker socket
network_mode: "none" # No network access
read_only: true
tmpfs:
- /tmp
security_opt:
- no-new-privileges:true
The network_mode: "none" is critical. Without network access, even a fully compromised agent cannot exfiltrate data. If the agent needs to install packages, you pre-install them in the image or mount a local package cache.
Lightweight VMs (Lima on macOS, Firecracker on Linux) provide the strongest isolation. The agent runs in a separate kernel with its own filesystem, network stack, and process space. This is the gold standard for running untrusted workloads, but it adds startup time and complexity that may not be justified for typical development workflows.
Choose the isolation level that matches your threat model:
| Method | Setup Cost | Isolation Level | Performance | Best For |
|---|---|---|---|---|
| Git worktrees | Low | Work-in-progress protection | Native | Protecting uncommitted changes |
| Docker containers | Medium | Filesystem + network | Near-native | Daily development with AI agents |
| Lightweight VMs | High | Full kernel-level | Moderate overhead | Running untrusted agents or code |
Layer 3: Network Isolation
An agent that cannot make outbound network requests cannot exfiltrate data. This is the single most effective control against adversarial prompt injection.
If you are running the agent in a container, network_mode: "none" handles this. If you are running it on your host, you need a different approach.
On macOS, application-level firewalls like Little Snitch or Lulu can restrict which processes are allowed to make outbound connections. On Linux, iptables or nftables rules scoped to a specific user or cgroup achieve the same effect.
For environments where the agent needs some network access (pulling documentation, running API tests), use an outbound proxy that allowlists specific domains:
# Squid proxy config — allow only essential domains
acl ai_allowed dstdomain .github.com
acl ai_allowed dstdomain .npmjs.org
acl ai_allowed dstdomain .golang.org
acl ai_allowed dstdomain .pkg.dev
http_access allow ai_allowed
http_access deny all
The agent gets the network access it needs for legitimate operations. Everything else is blocked.
Layer 4: Credential Isolation
The most dangerous thing on your filesystem is not your source code. It is your credentials. An agent that reads ~/.aws/credentials or ~/.kube/config has access to your cloud infrastructure. An agent that reads ~/.ssh/id_rsa can authenticate as you to any system that trusts your key.
The rules are simple:
- Never store credentials in the project directory. No
.envfiles with real secrets in the repo. Use a secrets manager or inject credentials at runtime. - Use short-lived tokens. AWS session tokens that expire in one hour are less dangerous than permanent access keys.
aws sts get-session-tokenwith a tight duration. - Use separate credential profiles for agent work. If you must give an agent AWS access, create a dedicated IAM role with minimum permissions and a separate profile.
- Mount credentials read-only when containerized. If the agent needs specific credentials, mount only the specific file, read-only, not your entire
~/.awsdirectory.
# Instead of this (exposes all AWS credentials and configs):
docker run -v ~/.aws:/home/agent/.aws my-sandbox
# Do this (exposes only a session token, read-only):
aws sts get-session-token --duration-seconds 3600 > /tmp/agent-creds.json
docker run -v /tmp/agent-creds.json:/home/agent/.aws/credentials:ro my-sandbox
Layer 5: Monitoring and Audit
The final layer is visibility. You need to know what the agent did, even if it operated within its permissions.
Action logging. Claude Code logs every tool call, every file read, every command execution. Review these logs after agent sessions, especially when the agent operated autonomously. Other tools have varying levels of logging — if yours does not log, treat that as a missing security control.
Git diffs before commit. Never let an agent commit directly. Always review the diff. git diff --stat gives you the scope. git diff gives you the details. Automated pre-commit hooks can scan for common mistakes:
#!/bin/bash
# .git/hooks/pre-commit — catch common agent mistakes
# Check for hardcoded credentials
if git diff --cached | grep -iE '(api_key|secret|password|token)\s*[:=]\s*["\x27][^"\x27]{8,}'; then
echo "ERROR: Possible hardcoded credentials detected in staged changes."
echo "Review the diff and remove any secrets before committing."
exit 1
fi
# Check for .env files being committed
if git diff --cached --name-only | grep -E '\.env($|\.)'; then
echo "ERROR: Attempting to commit .env file."
exit 1
fi
# Check for overly broad file deletions
DELETED=$(git diff --cached --diff-filter=D --name-only | wc -l)
if [ "$DELETED" -gt 5 ]; then
echo "WARNING: $DELETED files deleted in this commit. Please confirm this is intentional."
exit 1
fi
Cost monitoring. AI agents consume tokens. Agentic loops consume a lot of tokens. Set budget alerts. Claude Code reports cost per session. Track it. A runaway agent loop is a financial risk in addition to a security risk.
My Setup: Balancing Security and Velocity
I run Claude Code daily. It is my primary development tool for most of the work I do. Here is how I balance the security controls I just described with the velocity that makes these tools worth using.
For trusted projects (my own repos, active development):
- Claude Code with curated permission allowlist — reads are open, writes are scoped to
src/, destructive commands are denied - Git worktree for any experimental or large-scale refactoring work
- Pre-commit hooks for credential scanning and large deletion warnings
- Regular review of Claude’s action log
For untrusted or unfamiliar codebases (auditing third-party code, exploring new dependencies):
- Docker container with no network access
- Project directory mounted read-only
- Agent output captured to a separate directory for review
- No credentials mounted, period
For infrastructure and operations work (Terraform, Kubernetes, CI/CD):
- Full approval mode — every command requires my explicit confirmation
- Dedicated cloud credentials with read-only permissions where possible
- Never give an agent
terraform applyorkubectl deletewithout manual review
The key is that these are not three different tools. They are three configurations of the same tool, chosen based on the risk profile of the task.
The CISO’s Playbook: Policy for AI Coding Tools
If you are a security leader, your developers are already using AI coding agents. The question is whether they are using them securely. I wrote about this in my Shadow AI piece — the worst outcome is not that your developers use AI tools. It is that they use them without guardrails because you did not provide a sanctioned path.
Here is a policy framework that works:
1. Approved tool list with required configurations. Do not just approve “Claude Code.” Approve “Claude Code with the following minimum permission configuration” and provide the settings file. Make the secure path the easy path.
2. Credential hygiene requirements. No permanent credentials in developer home directories. Mandate short-lived tokens via SSO or STS. This is good practice regardless of AI tools — AI agents just make the existing risk of credential sprawl more acute.
3. Sandbox requirements by risk tier.
- Low risk (personal projects, learning, prototyping): Permission modes sufficient
- Medium risk (internal tools, non-production code): Container isolation recommended
- High risk (production code, regulated data, infrastructure): Container isolation required, network restrictions, credential isolation
4. Audit and review requirements. All AI-generated code must go through the same code review process as human-written code. No exceptions. The reviewer should be aware that the code was AI-generated, because the failure modes are different — AI-generated code tends to look correct while containing subtle logic errors or outdated patterns.
5. Incident response playbook. Add “AI agent compromise” to your IR scenarios. Key steps: revoke any credentials the agent had access to, review the agent’s action log, check for unauthorized commits or file modifications, scan for introduced dependencies.
Compliance Considerations
For organizations operating under SOC 2, ISO 27001, NYS DFS 500, or similar frameworks, AI coding agents introduce specific compliance questions:
Access control (SOC 2 CC6.1). AI agents that access production data, credentials, or infrastructure must be treated as system components with documented access controls. “A developer ran Claude Code on their laptop” is not a sufficient access control narrative.
Change management (SOC 2 CC8.1). AI-generated code changes must follow your documented change management process. If your policy requires peer review before merge, that applies to AI-generated changes. If your policy requires testing before deployment, the agent running tests does not satisfy the “independent verification” requirement.
Audit logging (SOC 2 CC7.2). If your agents are generating code that touches regulated systems, you need an audit trail of what the agent did. This is one area where tools like Claude Code, which log every action, actually make compliance easier — provided you are capturing and retaining those logs.
What Is Coming Next
The direction is clear: AI coding tools are becoming more autonomous, not less. MCP (Model Context Protocol) is expanding the surface area of what agents can interact with — GitHub, Jira, Slack, cloud consoles, databases. The agent that today reads your filesystem will tomorrow read your Slack channels, query your production database, and open pull requests on your behalf.
This is not a reason to avoid these tools. The productivity gains are real and significant. But the security posture must evolve at the same pace as the capability.
The organizations that get this right will treat AI coding agents the way they treat any other privileged access: with identity management, least-privilege permissions, network segmentation, audit logging, and incident response planning.
The organizations that get this wrong will find out when an agent reads the wrong file, sends the wrong request, or follows the wrong instruction.
Sandbox your filesystem. Isolate your credentials. Log everything. Review the diff.
The agent is powerful. The sandbox makes it safe.