DataFusion Server Usage Guide
GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

What's Data Sources

Data Source Definitions

Data sources are defining information from local files, object stores, API responses, etc. to be converted into the Arrow memory model of DataFusion’s dataframe.

The dataset retrieved from the data source definition is stored in-memory as a RecordBatch in Arrow. And from the DataFusion Server’s point of view, they are equivalent to SQL tables.

Supported Format

The DataFusion Server supports the following standard data sources in the following formats.

  • Arrow
  • JSON
  • ndJSON (new-line delimited JSON)
  • CSV
  • Parquet
  • Avro
  • Arrow Flight gRPC
  • Delta Lake

Data Source Connector Plugins

The data source can be easily extended by implementing the data source connector plugin in Python , details of which are explained in Data Source Connector Plugin.

Data Source Format and Location Matrix

format \ locationhttp(s)grpc(+tls)local filesystemobject storeplugin
Arrow
JSON
ndJSON
CSV
Parquet
Avro
Arrow Flight
Delta Lake
  • Supported
  • Save feature supported
  • Object stores supported (Configuration and Usage):
    • Amazon S3
    • Google Cloud Storage
    • Microsoft Azure Blob Storage
    • WebDAV