Hello all I want to start discussion about addition of the provider for Delta Sharing <http://delta.io/sharing> protocol - open source protocol for data sharing between organizations. There is already a number of data providers adopted it - for example, Nasdaq, Nyse, AWS, etc. (see site). The protocol and reference implementation were initially developed by Databricks in collaboration with other organizations, but right now the project is under an umbrella of the Linux Foundation.
The implementation (Pull Request <https://github.com/apache/airflow/pull/22692>, original Github issue <https://github.com/apache/airflow/issues/19473>) consists of two pieces: - Delta Sharing sensor that could be used for detection of new versions of data in a given Delta Sharing table - it will be used for triggering of data processing, for example, trigger a Spark job or download data for processing with Python. - Delta Sharing operator that downloads data from a given Delta Sharing table to a local disk, where this data could be processed. To conform to the AIP-47 <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests>, the system test was added to make sure that all changes are tested. This system test uses a public endpoint of reference Delta Sharing implementation, and needs only a network connectivity and small CPU and local disk resources. -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)