Skip to content

Docker Deployment

Run tasks inside a custom Docker container with pre-installed dependencies.

from databricks_bundle_decorators import job, job_cluster, task

docker_cluster = job_cluster(
    name="docker_cluster",
    spark_version="16.4.x-scala2.12",
    node_type_id="Standard_DS3_v2",
    num_workers=2,
    docker_image={"url": "my-registry.io/my-pipeline:latest"},
)

@job(cluster=docker_cluster, libraries=[])
def docker_pipeline():
    @task
    def extract():
        print("Running inside custom Docker image")

    @task
    def transform(raw_data):
        print("All heavy deps are pre-installed")

    e = extract()
    transform(raw_data=e)

libraries=[] tells the framework to skip attaching dist/*.whl — the package is already installed in the container.

libraries value Behaviour
None (default) Attach dist/*.whl — standard wheel deployment
[] No libraries — package pre-installed in Docker image
[Library(...)] Custom libraries — e.g. PyPI packages, Maven JARs

See Docker Deployment for the full guide including Dockerfile and databricks.yaml examples.