Hello all

I want to start discussion about addition of the provider for Delta Sharing
<http://delta.io/sharing> protocol - open source protocol for data sharing
between organizations. There is already a number of data providers adopted
it - for example, Nasdaq, Nyse, AWS, etc.  (see site). The protocol and
reference implementation were initially developed by Databricks in
collaboration with other organizations, but right now the project is under
an umbrella of the Linux Foundation.

The implementation (Pull Request
<https://github.com/apache/airflow/pull/22692>, original Github issue
<https://github.com/apache/airflow/issues/19473>) consists of two pieces:

   - Delta Sharing sensor that could be used for detection of new versions
   of data in a given Delta Sharing table - it will be used for triggering of
   data processing, for example, trigger a Spark job or download data for
   processing with Python.
   - Delta Sharing operator that downloads data from a given Delta Sharing
   table to a local disk, where this data could be processed.

To conform to the AIP-47
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests>,
the system test was added to make sure that all changes are tested. This
system test uses a public endpoint of reference Delta Sharing
implementation, and needs only a network connectivity and small CPU and
local disk resources.

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Reply via email to