ploosh.
Documentation
Databricks setup
This guide explains how to run Ploosh on Databricks using Spark mode.
Step 1: Install Ploosh
In the first cell of your Databricks notebook:
%pip install ploosh
Step 2: Restart Python
Databricks requires a Python restart after installing new packages:
dbutils.library.restartPython()
Step 3: Execute Ploosh
from ploosh import execute_casesroot_folder = "/Workspace/Shared"
execute_cases(
cases=f"{root_folder}/cases",
connections=f"{root_folder}/connections.yaml",
spark_session=spark,
pathoutput=f"{rootfolder}/output"
)
File organization
Organize your files in the Databricks workspace:
/Workspace/Shared/
├── connections.yaml → Connection definitions
├── cases/ → Test case YAML files
│ ├── validation.yaml
│ └── quality_checks.yaml
└── output/ → Generated results
└── json/
├── test_results.json
└── test_results/
└── *.xlsx
Using Unity Catalog
When using Databricks Unity Catalog, you can query tables directly with the sql_spark connector:
Test catalog data:
source:
type: sql_spark
query: |
SELECT *
FROM mycatalog.myschema.employees
WHERE department = 'Engineering'
expected:
type: csv_spark
path: /Workspace/Shared/expected/employees.csv
Using variables
Pass variables to Ploosh for dynamic configuration:
execute_cases(
cases=f"{root_folder}/cases",
connections=f"{root_folder}/connections.yaml",
spark_session=spark,
variables={
"env": "production",
"db_password": dbutils.secrets.get("my-scope", "db-password")
}
)
Scheduling
Use Databricks Jobs to schedule Ploosh execution:
- Create a new Job
- Add a Notebook task pointing to your Ploosh notebook
- Configure schedule (cron, trigger, or manual)
- Optionally add parameters to override notebook widgets