See also the Kubernetes deployment guide.
This library contains utilities for running Dagster with Kubernetes. This includes a Python API allowing Dagit to launch runs as Kubernetes Jobs, as well as a Helm chart you can use as the basis for a Dagster deployment on a Kubernetes cluster.
Docker image to use for launched Jobs. If this field is empty, the image that was used to originally load the Dagster repository will be used. (Ex: “mycompany.com/dagster-k8s-image:latest”).
Image pull policy to set on launched Pods.
Specifies that Kubernetes should get the credentials from the Secrets named in this list.
The name of the Kubernetes service account under which to run.
A list of custom ConfigMapEnvSource names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#define-an-environment-variable-for-a-container
A list of custom Secret names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of environment variables to inject into the Job. Each can be of the form KEY=VALUE or just KEY (in which case the value will be pulled from the current process). Default: []
. See: https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of volume mounts to include in the job’s container. Default: []
. See: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core
Default Value: []
A list of volumes to include in the Job’s Pod. Default: []
. For the many possible volume source types that can be included, see: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volume-v1-core
Default Value: []
Labels to apply to all created pods. See: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
Compute resource requirements for the container. See: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Use a custom Kubernetes scheduler for launched Pods. See:https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/
Security settings for the container. See:https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container
The name
of an existing Volume to mount into the pod in order to provide a ConfigMap for the Dagster instance. This Volume should contain a dagster.yaml
with appropriate values for run storage, event log storage, etc.
The name of the Kubernetes Secret where the postgres password can be retrieved. Will be mounted and supplied as an environment variable to the Job Pod.Secret must contain the key "postgresql-password"
which will be exposed in the Job environment as the environment variable DAGSTER_PG_PASSWORD
.
The location of DAGSTER_HOME in the Job container; this is where the dagster.yaml
file will be mounted from the instance ConfigMap specified here. Defaults to /opt/dagster/dagster_home.
Default Value: ‘/opt/dagster/dagster_home’
Set this value if you are running the launcher
within a k8s cluster. If True
, we assume the launcher is running within the target
cluster and load config using kubernetes.config.load_incluster_config
. Otherwise,
we will use the k8s config specified in kubeconfig_file
(using
kubernetes.config.load_kube_config
) or fall back to the default kubeconfig.
Default Value: True
The kubeconfig file from which to load config. Defaults to using the default kubeconfig.
Default Value: None
Whether the launched Kubernetes Jobs and Pods should fail if the Dagster run fails
Raw Kubernetes configuration for launched runs.
{}
{}
{}
{}
{}
{}
Default Value: ‘default’
RunLauncher that starts a Kubernetes Job for each Dagster job run.
Encapsulates each run in a separate, isolated invocation of dagster-graphql
.
You can configure a Dagster instance to use this RunLauncher by adding a section to your
dagster.yaml
like the following:
run_launcher:
module: dagster_k8s.launcher
class: K8sRunLauncher
config:
service_account_name: your_service_account
job_image: my_project/dagster_image:latest
instance_config_map: dagster-instance
postgres_password_secret: dagster-postgresql-secret
Docker image to use for launched Jobs. If this field is empty, the image that was used to originally load the Dagster repository will be used. (Ex: “mycompany.com/dagster-k8s-image:latest”).
Image pull policy to set on launched Pods.
Specifies that Kubernetes should get the credentials from the Secrets named in this list.
The name of the Kubernetes service account under which to run.
A list of custom ConfigMapEnvSource names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#define-an-environment-variable-for-a-container
A list of custom Secret names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of environment variables to inject into the Job. Each can be of the form KEY=VALUE or just KEY (in which case the value will be pulled from the current process). Default: []
. See: https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of volume mounts to include in the job’s container. Default: []
. See: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core
Default Value: []
A list of volumes to include in the Job’s Pod. Default: []
. For the many possible volume source types that can be included, see: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volume-v1-core
Default Value: []
Labels to apply to all created pods. See: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
Compute resource requirements for the container. See: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Use a custom Kubernetes scheduler for launched Pods. See:https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/
Security settings for the container. See:https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container
Whether retries are enabled or not. By default, retries are enabled.
{
"enabled": {}
}
{}
{}
Limit on the number of pods that will run concurrently within the scope of a Dagster run. Note that this limit is per run, not global.
A set of limits that are applied to steps with particular tags. If a value is set, the limit is applied to only that key-value pair. If no value is set, the limit is applied across all values of that key. If the value is set to a dict with applyLimitPerUniqueValue: true, the limit will apply to the number of unique values for that key. Note that these limits are per run, not global.
Executor which launches steps as Kubernetes Jobs.
To use the k8s_job_executor, set it as the executor_def when defining a job:
from dagster_k8s import k8s_job_executor
from dagster import job
@job(executor_def=k8s_job_executor)
def k8s_job():
pass
Then you can configure the executor with run config as follows:
execution:
config:
job_namespace: 'some-namespace'
image_pull_policy: ...
image_pull_secrets: ...
service_account_name: ...
env_config_maps: ...
env_secrets: ...
env_vars: ...
job_image: ... # leave out if using userDeployments
max_concurrent: ...
max_concurrent limits the number of pods that will execute concurrently for one run. By default there is no limit- it will maximally parallel as allowed by the DAG. Note that this is not a global limit.
Configuration set on the Kubernetes Jobs and Pods created by the K8sRunLauncher will also be set on Kubernetes Jobs and Pods created by the k8s_job_executor.
Configuration set using tags on a @job will only apply to the run level. For configuration to apply at each step it must be set using tags for each @op.
Image pull policy to set on launched Pods.
Specifies that Kubernetes should get the credentials from the Secrets named in this list.
The name of the Kubernetes service account under which to run.
A list of custom ConfigMapEnvSource names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#define-an-environment-variable-for-a-container
A list of custom Secret names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of environment variables to inject into the Job. Each can be of the form KEY=VALUE or just KEY (in which case the value will be pulled from the current process). Default: []
. See: https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of volume mounts to include in the job’s container. Default: []
. See: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core
Default Value: []
A list of volumes to include in the Job’s Pod. Default: []
. For the many possible volume source types that can be included, see: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volume-v1-core
Default Value: []
Labels to apply to all created pods. See: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
Compute resource requirements for the container. See: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Use a custom Kubernetes scheduler for launched Pods. See:https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/
Security settings for the container. See:https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container
The image in which to launch the k8s job.
The command to run in the container within the launched k8s job.
The args for the command for the container.
Set this value if you are running the launcher
within a k8s cluster. If True
, we assume the launcher is running within the target
cluster and load config using kubernetes.config.load_incluster_config
. Otherwise,
we will use the k8s config specified in kubeconfig_file
(using
kubernetes.config.load_kube_config
) or fall back to the default kubeconfig.
Default Value: True
The kubeconfig file from which to load config. Defaults to using the default kubeconfig.
Default Value: None
How long to wait for the job to succeed before raising an exception
Raw k8s config for the k8s pod’s main container (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#container-v1-core). Keys can either snake_case or camelCase.
Raw k8s config for the k8s pod’s metadata (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#objectmeta-v1-meta). Keys can either snake_case or camelCase.
Raw k8s config for the k8s pod’s pod spec (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#podspec-v1-core). Keys can either snake_case or camelCase.
Raw k8s config for the k8s job’s metadata (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#objectmeta-v1-meta). Keys can either snake_case or camelCase.
Raw k8s config for the k8s job’s job spec (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#jobspec-v1-batch). Keys can either snake_case or camelCase.
An op that runs a Kubernetes job using the k8s API.
Contrast with the k8s_job_executor, which runs each Dagster op in a Dagster job in its own k8s job.
You need to orchestrate a command that isn’t a Dagster op (or isn’t written in Python)
You want to run the rest of a Dagster job using a specific executor, and only a single op in k8s.
You can create your own op with the same implementation by calling the execute_k8s_job function inside your own op.
For example:
from dagster_k8s import k8s_job_op
from dagster import job
first_op = k8s_job_op.configured(
{
"image": "busybox",
"command": ["/bin/sh", "-c"],
"args": ["echo HELLO"],
},
name="first_op",
)
second_op = k8s_job_op.configured(
{
"image": "busybox",
"command": ["/bin/sh", "-c"],
"args": ["echo GOODBYE"],
},
name="second_op",
)
@job
def full_job():
second_op(first_op())
The service account that is used to run this job should have the following RBAC permissions:
rules:
- apiGroups: ["batch"]
resources: ["jobs", "jobs/status"]
verbs: ["*"]
# The empty arg "" corresponds to the core API group
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/status"]
verbs: ["*"]'
This function is a utility for executing a Kubernetes job from within a Dagster op.
image (str) – The image in which to launch the k8s job.
command (Optional[List[str]]) – The command to run in the container within the launched k8s job. Default: None.
args (Optional[List[str]]) – The args for the command for the container. Default: None.
namespace (Optional[str]) – Override the kubernetes namespace in which to run the k8s job. Default: None.
image_pull_policy (Optional[str]) – Allows the image pull policy to be overridden, e.g. to
facilitate local testing with kind. Default:
"Always"
. See:
https://kubernetes.io/docs/concepts/containers/images/#updating-images.
image_pull_secrets (Optional[List[Dict[str, str]]]) – Optionally, a list of dicts, each of
which corresponds to a Kubernetes LocalObjectReference
(e.g.,
{'name': 'myRegistryName'}
). This allows you to specify the `imagePullSecrets
on
a pod basis. Typically, these will be provided through the service account, when needed,
and you will not need to pass this argument. See:
https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
and https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#podspec-v1-core
service_account_name (Optional[str]) – The name of the Kubernetes service account under which
to run the Job. Defaults to “default” env_config_maps (Optional[List[str]]): A list of custom ConfigMapEnvSource names from which to
draw environment variables (using envFrom
) for the Job. Default: []
. See:
https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#define-an-environment-variable-for-a-container
env_secrets (Optional[List[str]]) – A list of custom Secret names from which to
draw environment variables (using envFrom
) for the Job. Default: []
. See:
https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
env_vars (Optional[List[str]]) – A list of environment variables to inject into the Job.
Default: []
. See: https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
volume_mounts (Optional[List[Permissive]]) – A list of volume mounts to include in the job’s
container. Default: []
. See:
https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core
volumes (Optional[List[Permissive]]) – A list of volumes to include in the Job’s Pod. Default: []
. See:
https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volume-v1-core
labels (Optional[Dict[str, str]]) – Additional labels that should be included in the Job’s Pod. See: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
resources (Optional[Dict[str, Any]]) – https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
scheduler_name (Optional[str]) – Use a custom Kubernetes scheduler for launched Pods. See: https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/
load_incluster_config (bool) – Whether the op is running within a k8s cluster. If True
,
we assume the launcher is running within the target cluster and load config using
kubernetes.config.load_incluster_config
. Otherwise, we will use the k8s config
specified in kubeconfig_file
(using kubernetes.config.load_kube_config
) or fall
back to the default kubeconfig. Default: True,
kubeconfig_file (Optional[str]) – The kubeconfig file from which to load config. Defaults to using the default kubeconfig. Default: None.
timeout (Optional[int]) – Raise an exception if the op takes longer than this timeout in seconds to execute. Default: None.
container_config (Optional[Dict[str, Any]]) – Raw k8s config for the k8s pod’s main container (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#container-v1-core). Keys can either snake_case or camelCase.Default: None.
pod_template_spec_metadata (Optional[Dict[str, Any]]) – Raw k8s config for the k8s pod’s metadata (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#objectmeta-v1-meta). Keys can either snake_case or camelCase. Default: None.
pod_spec_config (Optional[Dict[str, Any]]) – Raw k8s config for the k8s pod’s pod spec (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#podspec-v1-core). Keys can either snake_case or camelCase. Default: None.
job_metadata (Optional[Dict[str, Any]]) – aw k8s config for the k8s job’s metadata (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#objectmeta-v1-meta). Keys can either snake_case or camelCase. Default: None.
job_spec_config (Optional[Dict[str, Any]]) – Raw k8s config for the k8s job’s job spec (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#jobspec-v1-batch). Keys can either snake_case or camelCase.Default: None.
The K8sRunLauncher
allows Dagit instances to be configured to launch new runs by starting
per-run Kubernetes Jobs. To configure the K8sRunLauncher
, your dagster.yaml
should
include a section like:
run_launcher:
module: dagster_k8s.launcher
class: K8sRunLauncher
config:
image_pull_secrets:
service_account_name: dagster
job_image: "my-company.com/image:latest"
dagster_home: "/opt/dagster/dagster_home"
postgres_password_secret: "dagster-postgresql-secret"
image_pull_policy: "IfNotPresent"
job_namespace: "dagster"
instance_config_map: "dagster-instance"
env_config_maps:
- "dagster-k8s-job-runner-env"
env_secrets:
- "dagster-k8s-some-secret"
For local dev (e.g., on kind or minikube):
helm install \
--set dagit.image.repository="dagster.io/buildkite-test-image" \
--set dagit.image.tag="py37-latest" \
--set job_runner.image.repository="dagster.io/buildkite-test-image" \
--set job_runner.image.tag="py37-latest" \
--set imagePullPolicy="IfNotPresent" \
dagster \
helm/dagster/
Upon installation, the Helm chart will provide instructions for port forwarding Dagit and Flower (if configured).
To run the unit tests:
pytest -m "not integration"
To run the integration tests, you must have Docker, kind, and helm installed.
On macOS:
brew install kind
brew install helm
Docker must be running.
You may experience slow first test runs thanks to image pulls (run pytest -svv --fulltrace
for
visibility). Building images and loading them to the kind cluster is slow, and there is
no visibility into the progress of the load.
NOTE: This process is quite slow, as it requires bootstrapping a local kind
cluster with
Docker images and the dagster-k8s
Helm chart. For faster development, you can either:
Keep a warm kind cluster
Use a remote K8s cluster, e.g. via AWS EKS or GCP GKE
Instructions are below.
You may find that the kind cluster creation, image loading, and kind cluster creation loop is too slow for effective local dev.
You may bypass cluster creation and image loading in the following way. First add the --no-cleanup
flag to your pytest invocation:
pytest --no-cleanup -s -vvv -m "not integration"
The tests will run as before, but the kind cluster will be left running after the tests are completed.
For subsequent test runs, you can run:
pytest --kind-cluster="cluster-d9971c84d44d47f382a2928c8c161faa" --existing-helm-namespace="dagster-test-95590a" -s -vvv -m "not integration"
This will bypass cluster creation, image loading, and Helm chart installation, for much faster tests.
The kind cluster name and Helm namespace for this command can be found in the logs, or retrieved
via the respective CLIs, using kind get clusters
and kubectl get namespaces
. Note that
for kubectl
and helm
to work correctly with a kind cluster, you should override your
kubeconfig file location with:
kind get kubeconfig --name kind-test > /tmp/kubeconfig
export KUBECONFIG=/tmp/kubeconfig
The test fixtures provided by dagster-k8s
automate the process described below, but sometimes
it’s useful to manually configure a kind cluster and load images onto it.
First, ensure you have a Docker image appropriate for your Python version. Run, from the root of the repo:
./python_modules/dagster-test/dagster_test/test_project/build.sh 3.7.6 \
dagster.io.priv/buildkite-test-image:py37-latest
In the above invocation, the Python majmin version should be appropriate for your desired tests.
Then run the following commands to create the cluster and load the image. Note that there is no feedback from the loading process.
kind create cluster --name kind-test
kind load docker-image --name kind-test dagster.io/dagster-docker-buildkite:py37-latest
If you are deploying the Helm chart with an in-cluster Postgres (rather than an external database), and/or with dagster-celery workers (and a RabbitMQ), you’ll also want to have images present for rabbitmq and postgresql:
docker pull docker.io/bitnami/rabbitmq
docker pull docker.io/bitnami/postgresql
kind load docker-image --name kind-test docker.io/bitnami/rabbitmq:latest
kind load docker-image --name kind-test docker.io/bitnami/postgresql:latest
Then you can run pytest as follows:
pytest --kind-cluster=kind-test
If you already have a development K8s cluster available, you can run tests on that cluster vs.
running locally in kind
.
For this to work, first build and deploy the test image to a registry available to your cluster. For example, with a private ECR repository:
./python_modules/dagster-test/dagster_test/test_project/build.sh 3.7.6
docker tag dagster-docker-buildkite:latest $AWS_ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com/dagster-k8s-tests:2020-04-21T21-04-06
aws ecr get-login --no-include-email --region us-west-1 | sh
docker push $AWS_ACCOUNT_ID.dkr.ecr.us-west-1.amazonaws.com/dagster-k8s-tests:2020-04-21T21-04-06
Then, you can run tests on EKS with:
export DAGSTER_DOCKER_IMAGE_TAG="2020-04-21T21-04-06"
export DAGSTER_DOCKER_REPOSITORY="$AWS_ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com"
export DAGSTER_DOCKER_IMAGE="dagster-k8s-tests"
# First run with --no-cleanup to leave Helm chart in place
pytest --cluster-provider="kubeconfig" --no-cleanup -s -vvv
# Subsequent runs against existing Helm chart
pytest --cluster-provider="kubeconfig" --existing-helm-namespace="dagster-test-<some id>" -s -vvv
To test / validate Helm charts, you can run:
helm install dagster --dry-run --debug helm/dagster
helm lint
To enable GCR access from Minikube:
kubectl create secret docker-registry element-dev-key \
--docker-server=https://gcr.io \
--docker-username=oauth2accesstoken \
--docker-password="$(gcloud auth print-access-token)" \
--docker-email=my@email.com
Both the Postgres and the RabbitMQ Helm charts will store credentials using Persistent Volume
Claims, which will outlive test invocations and calls to helm uninstall
. These must be deleted if
you want to change credentials. To view your pvcs, run:
kubectl get pvc
The Redis Helm chart installs w/ a randomly-generated password by default; turn this off:
helm install dagredis stable/redis --set usePassword=false
Then, to connect to your database from outside the cluster execute the following commands:
kubectl port-forward --namespace default svc/dagredis-master 6379:6379
redis-cli -h 127.0.0.1 -p 6379