Docker Deployment¶
By default, databricks-bundle-decorators deploys tasks as
python_wheel_task jobs that install a .whl artifact at task startup.
An alternative is to pre-install your package (and all dependencies)
into a custom Docker image. This eliminates wheel install time
and gives you full control over the runtime environment.
Quick start¶
uv init my-pipeline && cd my-pipeline
uv add databricks-bundle-decorators
uv run dbxdec init --docker
The --docker flag generates a pipeline example with libraries=[]
and a databricks.yaml without the artifacts section, since the
package is pre-installed in the Docker image rather than uploaded as a
wheel.
How it works¶
The key difference from the standard wheel deployment is the libraries
parameter on @job:
from databricks_bundle_decorators import job, job_cluster, task
docker_cluster = job_cluster(
name="docker_cluster",
spark_version="16.4.x-scala2.12",
node_type_id="Standard_DS3_v2",
num_workers=2,
docker_image={
"url": "my-registry.io/my-pipeline:latest",
},
)
@job(
cluster=docker_cluster,
libraries=[], # package is pre-installed in the image
)
def my_pipeline():
@task
def extract():
...
@task
def transform(data):
...
raw = extract()
transform(raw)
libraries=[]
When libraries is not specified (the default), every generated task
includes Library(whl="dist/*.whl") — Databricks installs the wheel
before running the task. Setting libraries=[] tells the framework to
skip library installation entirely because the package is already
available in the container.
libraries parameter reference (per-job)
| Value | Behavior |
|---|---|
None (default) |
Use default_libraries from generate_resources() (defaults to dist/*.whl) |
[] |
No libraries — package pre-installed in Docker image |
[Library(...)] |
Custom libraries — e.g. PyPI packages, Maven JARs |
Tip
If all your jobs use a non-standard wheel location, set
default_libraries on generate_resources() instead of repeating
libraries= on every @job. See Bundle Configuration.
Dockerfile example¶
Your Docker image must have:
- The Databricks runtime base image
- Your pipeline package installed (with
databricks-bundle-decorators) - The
dbxdec-runentry point available on$PATH
FROM databricksruntime/standard:16.4.x-scala2.12
# Install your pipeline package (includes databricks-bundle-decorators)
COPY dist/*.whl /tmp/
RUN pip install /tmp/*.whl && rm /tmp/*.whl
Build and push:
uv build --wheel
docker build -t my-registry.io/my-pipeline:latest .
docker push my-registry.io/my-pipeline:latest
databricks.yaml for Docker¶
With Docker deployment you typically don't need the artifacts section
since no wheel is uploaded during databricks bundle deploy:
bundle:
name: my-pipeline
# No artifacts section needed — package is in the Docker image.
python:
venv_path: .venv
resources:
- 'resources:load_resources'
targets:
dev:
mode: development
workspace:
host: https://<your-workspace>.azuredatabricks.net/
Mixing Docker and wheel tasks¶
You can mix deployment strategies across different jobs in the same
project. Each @job has its own libraries setting:
# This job uses the standard wheel deployment
@job(cluster=wheel_cluster)
def standard_job():
...
# This job uses Docker
@job(cluster=docker_cluster, libraries=[])
def docker_job():
...
Important notes¶
-
Entry point discovery: The
dbxdec-runconsole script must be on$PATHinside the container. This happens automatically when youpip installthe package that depends ondatabricks-bundle-decorators. -
resources/__init__.pystill runs locally: Theload_resources()entry point is invoked bydatabricks bundle deployon your local machine (not inside the Docker image). Make sure your local environment has the package installed for codegen to work. -
Image registry access: Your Databricks workspace must have network access to pull from your container registry. Configure authentication via workspace-level Docker credentials.