Defines a way of configuring a job where the job can be run on one of a discrete set of partitions, and each partition corresponds to run configuration for the job.
Setting PartitionedConfig as the config for a job allows you to launch backfills for that job and view the run history across partitions.
Creates a static partitioned config for a job.
The provided partition_keys is a static list of strings identifying the set of partitions. The list of partitions is static, so while the run config returned by the decorated function may change over time, the list of valid partition keys does not.
This has performance advantages over dynamic_partitioned_config in terms of loading different partition views in Dagit.
The decorated function takes in a partition key and returns a valid run config for a particular target job.
partition_keys (Sequence[str]) – A list of valid partition keys, which serve as the range of values that can be provided to the decorated run config function.
PartitionedConfig
Creates a dynamic partitioned config for a job.
The provided partition_fn returns a list of strings identifying the set of partitions, given an optional datetime argument (representing the current time). The list of partitions returned may change over time.
The decorated function takes in a partition key and returns a valid run config for a particular target job.
partition_fn (Callable[[datetime.datetime], Sequence[str]]) – A function that generates a list of valid partition keys, which serve as the range of values that can be provided to the decorated run config function.
PartitionedConfig
Defines run config over a set of hourly partitions.
The decorated function should accept a start datetime and end datetime, which represent the date partition the config should delineate.
The decorated function should return a run config dictionary.
The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at the start_date at midnight. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If minute_offset is provided, the start and end times of each partition will be minute_offset past the hour.
start_date (Union[datetime.datetime, str]) – The first date in the set of partitions. Can provide in either a datetime or string format.
minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.
fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.
timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.
end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.
@hourly_partitioned_config(start_date=datetime(2022, 03, 12))
# creates partitions (2022-03-12-00:00, 2022-03-12-01:00), (2022-03-12-01:00, 2022-03-12-02:00), ...
@hourly_partitioned_config(start_date=datetime(2022, 03, 12), minute_offset=15)
# creates partitions (2022-03-12-00:15, 2022-03-12-01:15), (2022-03-12-01:15, 2022-03-12-02:15), ...
Defines run config over a set of daily partitions.
The decorated function should accept a start datetime and end datetime, which represent the bounds of the date partition the config should delineate.
The decorated function should return a run config dictionary.
The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at the start_date at midnight. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If minute_offset and/or hour_offset are used, the start and end times of each partition will be hour_offset:minute_offset of each day.
start_date (Union[datetime.datetime, str]) – The first date in the set of partitions. Can provide in either a datetime or string format.
minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.
hour_offset (int) – Number of hours past 00:00 to “split” the partition. Defaults to 0.
timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.
fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.
end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.
@daily_partitioned_config(start_date="2022-03-12")
# creates partitions (2022-03-12-00:00, 2022-03-13-00:00), (2022-03-13-00:00, 2022-03-14-00:00), ...
@daily_partitioned_config(start_date="2022-03-12", minute_offset=15, hour_offset=16)
# creates partitions (2022-03-12-16:15, 2022-03-13-16:15), (2022-03-13-16:15, 2022-03-14-16:15), ...
Defines run config over a set of weekly partitions.
The decorated function should accept a start datetime and end datetime, which represent the date partition the config should delineate.
The decorated function should return a run config dictionary.
The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at the start_date. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If day_offset is provided, the start and end date of each partition will be day of the week corresponding to day_offset (0 indexed with Sunday as the start of the week). If minute_offset and/or hour_offset are used, the start and end times of each partition will be hour_offset:minute_offset of each day.
start_date (Union[datetime.datetime, str]) – The first date in the set of partitions will Sunday at midnight following start_date. Can provide in either a datetime or string format.
minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.
hour_offset (int) – Number of hours past 00:00 to “split” the partition. Defaults to 0.
day_offset (int) – Day of the week to “split” the partition. Defaults to 0 (Sunday).
timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.
fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.
end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.
@weekly_partitioned_config(start_date="2022-03-12")
# creates partitions (2022-03-13-00:00, 2022-03-20-00:00), (2022-03-20-00:00, 2022-03-27-00:00), ...
@weekly_partitioned_config(start_date="2022-03-12", minute_offset=15, hour_offset=3, day_offset=6)
# creates partitions (2022-03-12-03:15, 2022-03-19-03:15), (2022-03-19-03:15, 2022-03-26-03:15), ...
Defines run config over a set of monthly partitions.
The decorated function should accept a start datetime and end datetime, which represent the date partition the config should delineate.
The decorated function should return a run config dictionary.
The resulting object created by this decorator can be provided to the config argument of a Job. The first partition in the set will start at midnight on the soonest first of the month after start_date. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number. If day_offset is provided, the start and end date of each partition will be day_offset. If minute_offset and/or hour_offset are used, the start and end times of each partition will be hour_offset:minute_offset of each day.
start_date (Union[datetime.datetime, str]) – The first date in the set of partitions will be midnight the sonnest first of the month following start_date. Can provide in either a datetime or string format.
minute_offset (int) – Number of minutes past the hour to “split” the partition. Defaults to 0.
hour_offset (int) – Number of hours past 00:00 to “split” the partition. Defaults to 0.
day_offset (int) – Day of the month to “split” the partition. Defaults to 1.
timezone (Optional[str]) – The timezone in which each date should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.
fmt (Optional[str]) – The date format to use. Defaults to %Y-%m-%d.
end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.
@monthly_partitioned_config(start_date="2022-03-12")
# creates partitions (2022-04-01-00:00, 2022-05-01-00:00), (2022-05-01-00:00, 2022-06-01-00:00), ...
@monthly_partitioned_config(start_date="2022-03-12", minute_offset=15, hour_offset=3, day_offset=5)
# creates partitions (2022-04-05-03:15, 2022-05-05-03:15), (2022-05-05-03:15, 2022-06-05-03:15), ...
A set of partitions where each partitions corresponds to a time window.
The provided cron_schedule determines the bounds of the time windows. E.g. a cron_schedule of “0 0 \* \* \*” will result in daily partitions that start at midnight and end at midnight of the following day.
The string partition_key associated with each partition corresponds to the start of the partition’s time window.
The first partition in the set will start on at the first cron_schedule tick that is equal to or after the given start datetime. The last partition in the set will end before the current time, unless the end_offset argument is set to a positive number.
cron_schedule (str) – Determines the bounds of the time windows.
start (datetime) – The first partition in the set will start on at the first cron_schedule tick that is equal to or after this value.
timezone (Optional[str]) – The timezone in which each time should exist. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.
fmt (str) – The date format to use for partition_keys.
end_offset (int) – Extends the partition set by a number of partitions equal to the value passed. If end_offset is 0 (the default), the last partition ends before the current time. If end_offset is 1, the second-to-last partition ends before the current time, and so on.
The schedule executes at the cadence specified by the partitioning, but may overwrite the minute/hour/day offset of the partitioning.
This is useful e.g. if you have partitions that span midnight to midnight but you want to schedule a job that runs at 2 am.
An interval that is closed at the start and open at the end.
A pendulum datetime that marks the start of the window.
datetime
A pendulum datetime that marks the end of the window.
datetime
Takes the cross-product of partitions from two partitions definitions.
For example, with a static partitions definition where the partitions are [“a”, “b”, “c”] and a daily partitions definition, this partitions definition will have the following partitions:
2020-01-01|a 2020-01-01|b 2020-01-01|c 2020-01-02|a 2020-01-02|b …
partitions_defs (Mapping[str, PartitionsDefinition]) – A mapping of dimension name to partitions definition. The total set of partitions will be the cross-product of the partitions from each PartitionsDefinition.
A sequence of PartitionDimensionDefinition objects, each of which contains a dimension name and a PartitionsDefinition. The total set of partitions will be the cross-product of the partitions from each PartitionsDefinition. This sequence is ordered by dimension name, to ensure consistent ordering of the partitions.
Sequence[PartitionDimensionDefinition]
A multi-dimensional partition key stores the partition key for each dimension. Subclasses the string class to keep partition key type as a string.
Contains additional methods to access the partition key for each dimension. Creates a string representation of the partition key for each dimension, separated by a pipe (|). Orders the dimensions by name, to ensure consistent string representation.
Creates a schedule from a time window-partitioned job.
The schedule executes at the cadence specified by the partitioning of the given job.
Defines a correspondence between the partitions in an asset and the partitions in an asset that it depends on.
Overriding PartitionMapping outside of Dagster is not supported. The abstract methods of this class may change at any time.
Returns the range of partition keys in the downstream asset that use the data in the given partition key range of the upstream asset.
upstream_partition_key_range (PartitionKeyRange) – The range of partition keys in the upstream asset.
downstream_partitions_def (PartitionsDefinition) – The partitions definition for the downstream asset.
upstream_partitions_def (PartitionsDefinition) – The partitions definition for the upstream asset.
Returns the subset of partition keys in the downstream asset that use the data in the given partition key subset of the upstream asset.
upstream_partitions_subset (Union[PartitionKeyRange, PartitionsSubset]) – The subset of partition keys in the upstream asset.
downstream_partitions_def (PartitionsDefinition) – The partitions definition for the downstream asset.
Returns the range of partition keys in the upstream asset that include data necessary to compute the contents of the given partition key range in the downstream asset.
downstream_partition_key_range (PartitionKeyRange) – The range of partition keys in the downstream asset.
downstream_partitions_def (PartitionsDefinition) – The partitions definition for the downstream asset.
upstream_partitions_def (PartitionsDefinition) – The partitions definition for the upstream asset.
Returns the subset of partition keys in the upstream asset that include data necessary to compute the contents of the given partition key subset in the downstream asset.
downstream_partitions_subset (Optional[PartitionsSubset]) – The subset of partition keys in the downstream asset.
upstream_partitions_def (PartitionsDefinition) – The partitions definition for the upstream asset.
The default mapping between two TimeWindowPartitionsDefinitions.
A partition in the downstream partitions definition is mapped to all partitions in the upstream asset whose time windows overlap it.
This means that, if the upstream and downstream partitions definitions share the same time period, then this mapping is essentially the identity partition mapping - plus conversion of datetime formats.
If the upstream time period is coarser than the downstream time period, then each partition in the downstream asset will map to a single (larger) upstream partition. E.g. if the downstream is hourly and the upstream is daily, then each hourly partition in the downstream will map to the daily partition in the upstream that contains that hour.
If the upstream time period is finer than the downstream time period, then each partition in the downstream asset will map to multiple upstream partitions. E.g. if the downstream is daily and the upstream is hourly, then each daily partition in the downstream asset will map to the 24 hourly partitions in the upstream that occur on that day.
If not 0, then the starts of the upstream windows are shifted by this offset relative to the starts of the downstream windows. For example, if start_offset=-1 and end_offset=0, then the downstream partition “2022-07-04” would map to the upstream partitions “2022-07-03” and “2022-07-04”. Only permitted to be non-zero when the upstream and downstream PartitionsDefinitions are the same. Defaults to 0.
int
If not 0, then the ends of the upstream windows are shifted by this offset relative to the ends of the downstream windows. For example, if start_offset=0 and end_offset=1, then the downstream partition “2022-07-04” would map to the upstream partitions “2022-07-04” and “2022-07-05”. Only permitted to be non-zero when the upstream and downstream PartitionsDefinitions are the same. Defaults to 0.
int
Examples
from dagster import DailyPartitionsDefinition, TimeWindowPartitionMapping, AssetIn, asset
partitions_def = DailyPartitionsDefinition(start_date="2020-01-01")
@asset(partitions_def=partitions_def)
def asset1():
...
@asset(
partitions_def=partitions_def,
ins={
"asset1": AssetIn(
partition_mapping=TimeWindowPartitionMapping(start_offset=-1)
)
}
)
def asset2(asset1):
...