Persistent Services
Persistent services run continuously alongside your server instead of starting and stopping for each job. They're ideal for tools that need to maintain state, serve APIs, or handle multiple requests without the overhead of a cold start on every call.
Persistent vs standard services
| Standard Services | Persistent Services | |
|---|---|---|
| Lifecycle | Start per job, stop when complete | Run continuously |
| Startup | Cold start for each execution | Warm instance, always ready |
| State | Stateless — each run is independent | Can maintain state across jobs |
| Concurrency | One job at a time | Configurable parallel jobs |
| Billing | Pay per execution | Pay per uptime |
| Best for | Batch processing, one-off tasks | APIs, ML inference, interactive tools |
Choose standard services when each job is independent — file conversion, data processing, simulation runs — and cold start time is acceptable.
Choose persistent services when you need fast response times, maintained state between calls, or the ability to serve an API that multiple tools or users can access simultaneously.
How persistent services work
When a persistent service is started:
- An instance is launched on dedicated compute (EC2) with the configured resources.
- The service container starts and remains running, ready to accept jobs.
- Volumes are mounted so the service has access to data.
- Jobs are routed to the running instance via HTTP — no container startup delay.
- Health monitoring ensures the service stays available, with automatic restarts on failure.
The service keeps running until explicitly stopped or until an idle shutdown timer expires.
Configuration
Persistent services support several configuration options:
- Auto-start — Automatically start the service when a server launches.
- Auto-restart — Restart the service if it crashes or becomes unhealthy.
- Idle shutdown — Automatically stop the service after a period of inactivity to save costs.
- Health check interval — How often the platform checks if the service is responsive.
- Max queue size — Maximum number of jobs that can be queued when the service is busy.
- Concurrent requests — Number of jobs the service can handle in parallel.
Instance types
Persistent services run on dedicated compute instances. You can choose from CPU and GPU options:
| Category | Instance Types | Best for |
|---|---|---|
| CPU | General purpose instances | API servers, lightweight processing |
| GPU | NVIDIA T4, V100 instances | ML inference, rendering, GPU-accelerated tasks |
GPU instances come with NVIDIA drivers and the container toolkit pre-configured, so GPU-enabled containers work out of the box.
Creating a persistent service
Persistent services use the same project structure as standard services — the difference is in how they're registered and run on the platform. See Creating a new Service for the base project setup.
When deploying, register the service as persistent through the Agent's persistent service tools. The key differences from standard deployment:
- Configure the instance type appropriate for your workload.
- Set lifecycle options — auto-start, auto-restart, idle shutdown.
- Define concurrency — how many parallel jobs the service should handle.
Job execution
When the Agent calls a persistent service tool:
- The job is sent to the Persistent Service Manager (PSM).
- PSM routes the job to the running instance.
- The instance executes the job in a Docker container.
- Results are returned to the Agent.
If the service is busy, the job is queued (up to the configured max queue size). If the service is stopped, the platform can auto-start it if configured.
Service status
Persistent services have the following statuses:
- Running — The service is active and accepting jobs.
- Starting — The instance is being launched and the service is initializing.
- Stopped — The service is not running. No compute costs are incurred.
- Stopping — The service is shutting down gracefully.
- Error — The service encountered a problem. Check logs for details.
Best practices
- Set idle shutdown for services that aren't used continuously. This avoids paying for compute when the service is inactive.
- Use auto-restart for production services that need high availability.
- Size your instance appropriately — over-provisioning wastes money, under-provisioning causes failures.
- Monitor health checks to catch issues early. If a service becomes unresponsive, the platform will restart it automatically when auto-restart is enabled.
- Design for concurrency if your service will handle multiple simultaneous requests. Ensure your tool implementation is thread-safe.

