Sandbox

Isolated environment for running LLM tasks locally and in the cloud.


AI models have access to running tools. From getting local files via bash commands, or editing files directly using string replace commands. Coding tools like Claude Code has potentially infinite control over your machine if configured with the argument "--dangerously-skip-permissions". What becomes a challenge with such great acceess is the question of how do you give the permission, yet can contain the AI inside of a externally managed environment. This is where sandboxes comes in.


Types of sandboxes

Local sandbox

Claude Code is a good example of a fully autonomous tool, running inside a controlled sandbox on your machine. CC is built on top of macOS Seatbelt (via sandbox-exec) to enforce both filesystem and network isolation for all bash commands executed by Claude. This gives Claude Code a defined environment it can execute commands and edits onto, and complete isolation configurerd by users or app.

Cloud sandbox

I find cloud sandboxes to be the ultimate weapon for creating apps with AI model inside agent frameworks such as the Claude Agent SDK. This allows you to write an entire app in python, define agent tasks, import task context, and finally ship this into a cloud sandbox. Inside of this ephemeral cloud computer the AI agent can do the work defined in your app. For example, you can write an entire accounting app to close your books in the cloud, and make the agent return the profits and losses every day of every week.

There are several great sandbox virtual-machine serviecs available. But I prefer to use E2B , because of their simple python SDK and perfect CLI tools for managing usage. Here, creating sandbox "templates" is high leverage. This allows you to pre-build a template machine with the packages you need to use for the tool to function before hand. This allows you to call the sandbox, and it being able to run almost instantly (instead of waiting for it do download and build every time).


Isolations

Sandboxes are helpful when wanting to do more than whats possible with one environment. You can effectively spin up infite sandboxes both locally and in the cloud, and run isolated tasks within. The seperation is helpful if you are managing many environments and want to experiemnt. SB allow for near-infinite replication of environments, enabling users to simulate a wide range of scenarios and configurations with minimal overhead.

Given the risk lust nature of AI models, these models can output anything. They are probability machines put into data centers. When you build, run tasks, and use these models in both your work and life, you must sandbox the outputs of the AIs if the output can be executed (if you collect the output of an LLM in for example JSON or python, and you intend to run this code, sandbox it always).


Giving the agent a space to work

Fundementally a virtual-machine or local sandbox is a tool you use to give your agent-first app the space it can do work. The limitations of a web app, forces you to make use of a sandbox that can be called, turned on, and spawned on demand from your app.


I designed a tool called Cocentives that orcehstrates this type of sandbox first arcitecture. From the ground up Co relies on agents inside sandboxes to do the core backend work. For Co, an example task is agent identification. Once a user initializes a new report run and uploads files and configures the basics, the agent picks up the work inside a cloud sandbox. From here, the task iterates through a) agent identification, b) context orchestration, c) commission extraction, and, d) validation with python maths.

This entire flow is ran inside sandboxes. For maximally effeciency, the commission extration task is ran inside parallel sandboxes to run work in parallel. This enables incredible speed of application, and makes for a fast user experience, compared to doing it sequentially.