Skip to content
Developer Guide

Persistent Services

Persistent services run continuously alongside your server instead of starting and stopping for each job. They're ideal for tools that need to maintain state, serve APIs, or handle multiple requests without the overhead of a cold start on every call.

Persistent vs standard services

Standard ServicesPersistent Services
LifecycleStart per job, stop when completeRun continuously
StartupCold start for each executionWarm instance, always ready
StateStateless — each run is independentCan maintain state across jobs
ConcurrencyOne job at a timeConfigurable parallel jobs
BillingPay per executionPay per uptime
Best forBatch processing, one-off tasksAPIs, ML inference, interactive tools

Choose standard services when each job is independent — file conversion, data processing, simulation runs — and cold start time is acceptable.

Choose persistent services when you need fast response times, maintained state between calls, or the ability to serve an API that multiple tools or users can access simultaneously.

How persistent services work

When a persistent service is started:

  1. An instance is launched on dedicated compute (EC2) with the configured resources.
  2. The service container starts and remains running, ready to accept jobs.
  3. Volumes are mounted so the service has access to data.
  4. Jobs are routed to the running instance via HTTP — no container startup delay.
  5. Health monitoring ensures the service stays available, with automatic restarts on failure.

The service keeps running until explicitly stopped or until an idle shutdown timer expires.

Configuration

Persistent services support several configuration options:

  • Auto-start — Automatically start the service when a server launches.
  • Auto-restart — Restart the service if it crashes or becomes unhealthy.
  • Idle shutdown — Automatically stop the service after a period of inactivity to save costs.
  • Health check interval — How often the platform checks if the service is responsive.
  • Max queue size — Maximum number of jobs that can be queued when the service is busy.
  • Concurrent requests — Number of jobs the service can handle in parallel.

Instance types

Persistent services run on dedicated compute instances. You can choose from CPU and GPU options:

CategoryInstance TypesBest for
CPUGeneral purpose instancesAPI servers, lightweight processing
GPUNVIDIA T4, V100 instancesML inference, rendering, GPU-accelerated tasks

GPU instances come with NVIDIA drivers and the container toolkit pre-configured, so GPU-enabled containers work out of the box.

Creating a persistent service

Persistent services use the same project structure as standard services — the difference is in how they're registered and run on the platform. See Creating a new Service for the base project setup.

When deploying, register the service as persistent through the Agent's persistent service tools. The key differences from standard deployment:

  1. Configure the instance type appropriate for your workload.
  2. Set lifecycle options — auto-start, auto-restart, idle shutdown.
  3. Define concurrency — how many parallel jobs the service should handle.

Job execution

When the Agent calls a persistent service tool:

  1. The job is sent to the Persistent Service Manager (PSM).
  2. PSM routes the job to the running instance.
  3. The instance executes the job in a Docker container.
  4. Results are returned to the Agent.

If the service is busy, the job is queued (up to the configured max queue size). If the service is stopped, the platform can auto-start it if configured.

Service status

Persistent services have the following statuses:

  • Running — The service is active and accepting jobs.
  • Starting — The instance is being launched and the service is initializing.
  • Stopped — The service is not running. No compute costs are incurred.
  • Stopping — The service is shutting down gracefully.
  • Error — The service encountered a problem. Check logs for details.

Best practices

  • Set idle shutdown for services that aren't used continuously. This avoids paying for compute when the service is inactive.
  • Use auto-restart for production services that need high availability.
  • Size your instance appropriately — over-provisioning wastes money, under-provisioning causes failures.
  • Monitor health checks to catch issues early. If a service becomes unresponsive, the platform will restart it automatically when auto-restart is enabled.
  • Design for concurrency if your service will handle multiple simultaneous requests. Ensure your tool implementation is thread-safe.