Backfill Definitions¶
databricks_bundle_decorators.backfill.BackfillDef
¶
Bases: ABC
Base class for backfill definitions.
Subclasses declare the universe of valid backfill keys for
enumeration. The dbxdec backfill CLI uses these to
generate backfill_key values.
keys(start=None, end=None)
abstractmethod
¶
Enumerate concrete backfill key strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
str | None
|
Override the start bound (inclusive). Must use the same format as the definition's keys. |
None
|
end
|
str | None
|
Override the end bound (inclusive). Must use the same format as the definition's keys. |
None
|
Source code in src/databricks_bundle_decorators/backfill.py
current_key()
¶
Return the backfill key for the current point in time.
Used as a fallback when a job with a backfill definition is
triggered without an explicit backfill_key (e.g. by a cron
schedule or file-arrival trigger).
Time-based subclasses return the key matching "now" in their
configured timezone. StaticBackfill returns None
because there is no sensible default.
Source code in src/databricks_bundle_decorators/backfill.py
databricks_bundle_decorators.backfill.DailyBackfill(start_date, end_date=None, tz='UTC', lookback=0, collect_schedule_gaps=False, data_lag=0)
dataclass
¶
Bases: BackfillDef
One key per calendar day.
Keys are ISO-8601 dates: YYYY-MM-DD.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date
|
str
|
First key (inclusive), e.g. |
required |
end_date
|
str | None
|
Last key (inclusive). Defaults to today in tz. |
None
|
tz
|
str
|
IANA timezone name (e.g. |
'UTC'
|
lookback
|
int
|
Number of additional prior keys to include. For example,
|
0
|
collect_schedule_gaps
|
bool
|
When |
False
|
data_lag
|
int
|
Number of periods to subtract from the default end bound.
Use |
0
|
current_key()
¶
Today's date in the configured timezone, shifted by data_lag.
Source code in src/databricks_bundle_decorators/backfill.py
databricks_bundle_decorators.backfill.WeeklyBackfill(start_date, end_date=None, tz='UTC', lookback=0, collect_schedule_gaps=False, data_lag=0)
dataclass
¶
Bases: BackfillDef
One key per ISO week.
Keys are ISO week dates: YYYY-WNN (e.g. "2024-W03").
The default end_date is the Monday of the current ISO week.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date
|
str
|
First key (inclusive), e.g. |
required |
end_date
|
str | None
|
Last key (inclusive). Defaults to the current ISO week. |
None
|
tz
|
str
|
IANA timezone name. Used to determine "today" when end_date is omitted. |
'UTC'
|
lookback
|
int
|
Number of additional prior keys (weeks) to include. |
0
|
collect_schedule_gaps
|
bool
|
When |
False
|
data_lag
|
int
|
Number of periods (weeks) to subtract from the default
end bound. Use |
0
|
current_key()
¶
Current ISO week in the configured timezone, shifted by data_lag.
Source code in src/databricks_bundle_decorators/backfill.py
databricks_bundle_decorators.backfill.MonthlyBackfill(start_date, end_date=None, tz='UTC', lookback=0, collect_schedule_gaps=False, data_lag=0)
dataclass
¶
Bases: BackfillDef
One key per calendar month.
Keys are ISO-8601 dates pinned to the first of the month:
YYYY-MM-01 (e.g. "2024-01-01").
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date
|
str
|
First key (inclusive), e.g. |
required |
end_date
|
str | None
|
Last key (inclusive). Defaults to the current month. |
None
|
tz
|
str
|
IANA timezone name. Used to determine "today" when end_date is omitted. |
'UTC'
|
lookback
|
int
|
Number of additional prior keys (months) to include. |
0
|
collect_schedule_gaps
|
bool
|
When |
False
|
data_lag
|
int
|
Number of periods (months) to subtract from the default
end bound. Use |
0
|
current_key()
¶
First day of the current month in the configured timezone, shifted by data_lag.
Source code in src/databricks_bundle_decorators/backfill.py
databricks_bundle_decorators.backfill.HourlyBackfill(start_date, end_date=None, tz='UTC', lookback=0, collect_schedule_gaps=False, data_lag=0)
dataclass
¶
Bases: BackfillDef
One key per hour.
Keys are truncated ISO-8601 timestamps: YYYY-MM-DDTHH
(e.g. "2024-01-01T00").
All enumeration is performed in the specified timezone (default UTC) so that daylight-saving transitions are handled correctly — hours that don't exist are skipped, and ambiguous hours appear once.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date
|
str
|
First key (inclusive), e.g. |
required |
end_date
|
str | None
|
Last key (inclusive). Defaults to the current hour in tz. |
None
|
tz
|
str
|
IANA timezone name (e.g. |
'UTC'
|
lookback
|
int
|
Number of additional prior keys (hours) to include. |
0
|
collect_schedule_gaps
|
bool
|
When |
False
|
data_lag
|
int
|
Number of periods (hours) to subtract from the default
end bound. Use |
0
|
current_key()
¶
Current hour in the configured timezone, shifted by data_lag.
Source code in src/databricks_bundle_decorators/backfill.py
databricks_bundle_decorators.backfill.StaticBackfill(keys)
dataclass
¶
Bases: BackfillDef
A fixed set of backfill keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
list[str]
|
The complete list of valid backfill keys. |
required |
Example
::
StaticBackfill(keys=["us", "eu", "jp"])
Source code in src/databricks_bundle_decorators/backfill.py
databricks_bundle_decorators.backfill.get_backfill_key(*, validate=True)
¶
Return the raw backfill key for the current job run.
Reads the backfill_key job parameter and optionally validates
it against the job's BackfillDef boundaries.
When the parameter is missing or empty and the job has a
time-based BackfillDef, the key is auto-derived from the
current time (e.g. today's date for DailyBackfill) and a
warning is logged. This allows cron-triggered and file-arrival
runs to work without explicitly supplying the key.
For time-based backfills the key is an ISO-8601 date/time string;
for StaticBackfill it is one of the declared keys (e.g.
"us", "eu").
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validate
|
bool
|
When |
True
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
ValueError
|
If validate is |
Returns:
| Type | Description |
|---|---|
str
|
The raw backfill key string. |
Source code in src/databricks_bundle_decorators/backfill.py
databricks_bundle_decorators.backfill.get_backfill_keys(*, validate=True)
¶
Return all backfill keys for the current run.
When neither lookback nor collect_schedule_gaps is
configured, this returns a single-element list equivalent to
[get_backfill_key()].
With lookback=N, the result includes N prior keys plus the
current key. This applies in all run modes (scheduled and
explicit backfill).
With collect_schedule_gaps=True, keys between the previous
cron fire date and the current key are included. This only
applies to auto-derived keys (scheduled runs); when
backfill_key is explicitly provided (via dbxdec backfill
--keys), schedule gap logic is bypassed.
When both are configured the result is the sorted union.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validate
|
bool
|
When |
True
|
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of backfill key strings (ascending). |
Source code in src/databricks_bundle_decorators/backfill.py
682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 | |
databricks_bundle_decorators.backfill.get_run_logical_date(*, validate=True)
¶
Return the backfill key parsed as a timezone-aware datetime.
Convenience wrapper around get_backfill_key for time-based
backfills (DailyBackfill, WeeklyBackfill, etc.). Not
suitable for StaticBackfill with non-date keys — use
get_backfill_key instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validate
|
bool
|
When |
True
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
ValueError
|
If the key cannot be parsed as an ISO-8601 date/time, or if
validate is |
Returns:
| Type | Description |
|---|---|
datetime
|
Timezone-aware datetime representing the backfill key. |
Source code in src/databricks_bundle_decorators/backfill.py
Cross-partition reads¶
databricks_bundle_decorators.decorators.all_partitions(proxy)
¶
Wrap a TaskProxy so the downstream task receives all partitions.
Use inside a @job body to indicate that the downstream task
should read the entire dataset from the upstream task, across
all partitions, rather than filtering to the current backfill_key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
proxy
|
TaskProxy
|
A |
required |
Returns:
| Type | Description |
|---|---|
`_AllPartitionsProxy`
|
A wrapped proxy that records the all-partitions flag on the dependency edge. |
Example
::
@job(backfill=DailyBackfill(start_date="2024-01-01"))
def my_pipeline():
@task(io_manager=io)
def extract(): ...
@task
def aggregate(data): ...
data = extract()
aggregate(all_partitions(data))