Dagster Cloud Serverless is a fully managed version of Dagster Cloud, and is the easiest way to get started with Dagster. With Serverless, you can run your Dagster jobs without spinning up any infrastructure.
Serverless works best with workloads that primarily orchestrate other services or perform light computation. Most workloads fit into this category, especially those that orchestrate third-party SaaS products like cloud data warehouses and ETL tools.
If any of the following are applicable, you should select Hybrid deployment:
You require substantial computational resources. For example, training a large machine learning (ML) model in-process.
Your dataset is too large to fit in memory. For example, training a large machine learning (ML) model in-process on a terabyte of data.
You need to distribute computation across many nodes for a single run. Dagster Cloud runs currently execute on a single node with 4 CPUs.
You don't want to add Elementl as a data processor.
If you are a GitHub user, our GitHub integration is the fastest way to get started. It uses a GitHub app and GitHub Actions to set up a repo containing skeleton code and configuration consistent with Dagster Cloud's best practices with a single click.
When you create a new Dagster Cloud organization, you'll be prompted to choose Serverless or Hybrid deployment. Once activated, our GitHub integration will scaffold a new git repo for you with Serverless and Branch Deployments already configured. Pushing to the main branch will deploy to your prod Serverless deployment. Pull requests will spin up ephemeral branch deployments using the Serverless agent.
Without GitHub (GitLab, BitBucket, or local development)#
If you don't want to use our GitHub integration, we offer a powerful CLI that you can use in another CI environment or on your local laptop.
Any dependencies specified in either requirements.txt or setup.py will be installed for you automatically by the Dagster Cloud Serverless infrastructure.
Dagster Cloud Serverless packages your code as PEX files and deploys them on Docker images. Using PEX files significantly reduces the time to deploy since it does not require building a new Docker image and provisioning a new container for every code change. Many apps will work fine with the default Dagster Cloud Serverless setup. However, some apps may need to make changes to the runtime environment, either to include data files, use a different base image, different Python version, or install some native dependencies. You can customize the runtime environment using various methods described below.
To add data files to your deployment, use the Data Files Support built into Python's setup.py. This requires adding a package_data or include_package_data keyword in the call to setup() in setup.py. For example, given this directory structure:
If you want to include the data folder, modify your setup.py to add the package_data line:
# setup.pyfrom setuptools import find_packages, setup
if __name__ =="__main__":
setup(
name="my_dagster_project",
packages=find_packages(exclude=["my_dagster_project_tests"]),# Add the following line. Here "data/*" is relative to the my_dagster_project sub directory.
package_data={"my_dagster_project":["data/*"]},
install_requires=["dagster",...],)
The default version of Python for Serverless deployments is Python 3.8. Versions 3.7, 3.9 and 3.10 are also supported. You can specify the version you want by updating your GitHub workflow or using the --python-version command line argument:
With GitHub: Change the python_version parameter for the build_deploy_python_executable job in your .github/workflows files. For example:
-name: Build and deploy Python executable
if: env.ENABLE_FAST_DEPLOYS == 'true'
uses: dagster-io/dagster-cloud-action/actions/build_deploy_python_executable@pex-v0.1
with:dagster_cloud_file:"$GITHUB_WORKSPACE/project-repo/dagster_cloud.yaml"build_output_dir:"$GITHUB_WORKSPACE/build"python_version:"3.9"# Change this value to the desired Python versionenv:GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
With the CLI: Add the --python-version CLI argument to the deploy command to specify the registry path to the desired base image:
Using a different base image or using native dependencies#
Dagster Cloud runs your code on a Docker image that we build as follows:
The standard Python "slim" Docker image, such as python:3.8-slim is used as the base.
The dagster-cloud[serverless] module installed in the image.
As far as possible, add all dependencies by including the corresponding native Python bindings in your setup.py. When that is not possible, you can build and upload a custom base image that will be used to run your Python code.
To build and upload the image, use the command line:
Build your Docker image using docker build or your usual Docker toolchain. Ensure the dagster-cloud[serverless] dependency is included. You can do this by adding the following to your Dockerfile:
RUN pip install"dagster-cloud[serverless]"
Upload your Docker image to Dagster Cloud using the upload-base-image command. Note that this command prints out the tag used in Dagster Cloud to identify your image:
$ dagster-cloud serverless upload-base-image local-image:tag
...
To use the uploaded image run: dagster-cloud deploy-python-executable ... --base-image-tag=sha256_518ad2f92b078c63c60e89f0310f13f19d3a1c7ea9e1976d67d59fcb7040d0d6
To use a Docker image you have published to Dagster Cloud, use the --base-image-tag tag printed out by the above command.
With GitHub: Set the SERVERLESS_BASE_IMAGE_TAG environment variable in your GitHub Actions configuration (usually at .github/workflows/deploy.yml):
Prior to using PEX files, Dagster Cloud deployed code using Docker images. This feature is still available. To deploy using a Docker image instead of PEX:
With GitHub: Delete the ENABLE_FAST_DEPLOYS: 'true' line in your GitHub Actions configuration (usually at .github/workflows/deploy.yml):
With the CLI: Use the deploy command instead of the deploy-python-executable command:
dagster-cloud serverless deploy \
--location-name example \
--package-name assets_modern_data_stack
The Docker image deployed can be customized using either lifecycle hooks or customizing the base image.
This method is the easiest to set up, and does not require setting up any additional infrastructure.
In the root of your repo, you can provide two optional shell scripts: dagster_cloud_pre_install.sh and dagster_cloud_post_install.sh. These will run before and after Python dependencies are installed. They are useful for installing any non-Python dependencies or otherwise configuring your environment.
This method is the most flexible, but requires setting up a pipeline outside of Dagster to build a custom base image.
The default base image is debian:bullseye-slim, but it can be changed.
With GitHub: Provide a base_image input parameter to the Build and deploy step in your GitHub Actions configuration (usually at .github/workflows/deploy.yml):
-name: Build and deploy to Dagster Cloud serverless
uses: dagster-io/dagster-cloud-action/actions/serverless_prod_deploy@v0.1
with:dagster_cloud_api_token: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }}location: ${{ toJson(matrix.location) }}# Use a custom base imagebase_image:"my_base_image:latest"organization_id: ${{ secrets.ORGANIZATION_ID }}env:GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
With the CLI: Add the --base-image CLI argument to the deploy command to specify the registry path to the desired base image:
If your organization begins to hit the limitations of Serverless, you should transition to a Hybrid deployment. Hybrid deployments allow you to run an agent in your own infrastructure and give you substantially more flexibility and control over the Dagster environment.
To switch to Hybrid, navigate to Status > Agents in your Dagster Cloud account. On this page, you can disable the Serverless agent on and view instructions for enabling Hybrid.
Dagster Cloud Serverless offers two settings for run isolation: isolated and non-isolated. Non-isolated runs are for iterating quickly and trade off isolation for speed. Isolated runs are for production and compute heavy Assets/Jobs.
Isolated runs each take place in their own container with their own compute resources: 4 cpu cores and 16GB of RAM.
These runs may take up to 3 minutes to start while these resources are provisioned.
When launching runs manually, select Isolate run environment in the Launchpad to launch an isolated runs. Scheduled, sensor, and backfill runs are always isolated.
Note: if non-isolated runs aren't enabled (see the section below), the toggle won't appear and all runs will be isolated.
This can be enabled or disabled in deployment settings with
non_isolated_runs:enabled:True
Non-isolated runs provide a faster start time by using a standing, shared container for each code location.
They have fewer compute resources: 0.25 vCPU cores and 1GB of RAM. These resources are shared with other processes for a code location like sensors. As a result, it's recommended to use isolated runs for compute intensive jobs and asset materializations.
While launching runs from the Launchpad, leave Isolate run environment unchecked to launch a non-isolated run. Materializing assets from the UI also defaults to non-isolated.
By default only one non-isolated run will execute at once. While a run is in progress, the the Launchpad will swap to only launching isolated runs.
This limit can be configured in deployment settings. Take caution; The limit is in place to help wih avoiding crashes due to OOMs.