SQL (Spark)

This connector is used to execute Spark SQL queries. It is the primary connector for querying Lakehouse tables in Microsoft Fabric or registered tables in Databricks.

⚠️ A Spark connector can only be used with another Spark connector. It is not possible to mix Spark and native connectors in the same test case.

See Spark mode overview for more information.

Connection configuration

No connection is required by this connector.

Test case configuration

Name	Mandatory	Default	Description
query	yes		Spark SQL query to execute

Example

Example SQL Spark:
  source:
    type: sql_spark
    query: |
      SELECT department, COUNT(*) AS count
      FROM lakehouse.employees
      GROUP BY department
  expected:
    type: csv_spark
    path: /lakehouse/default/Files/expected/department_counts.csv
    header: true
    inferSchema: true

Example with Fabric shortcuts

When using shortcuts in Microsoft Fabric, remote Lakehouse tables are queryable as local tables:

Test cross-workspace data:
  source:
    type: sql_spark
    query: |
      SELECT *
      FROM dwlakehouse.factsales
      WHERE sale_date >= '2024-01-01'
  expected:
    type: sql_spark
    query: |
      SELECT *
      FROM reportinglakehouse.factsales_report
      WHERE sale_date >= '2024-01-01'

See Fabric shortcuts strategy for more details.

ploosh.

Documentation

SQL (Spark)

Connection configuration

Test case configuration

Example

Example with Fabric shortcuts