Very interested in this. I am quite positive and supportive of adding support for a generic dataframe abstraction within Airflow. However, I do have a few questions around how and where to include this within Airflow from a dependency perspective.
I do wonder if this needs to be in Core Airflow or can be something outside of Core Airflow as an Operator building block. On Tue, Jun 25, 2024 at 12:36 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Yeah I'm not saying there shouldn't be an airflow library. It's just > > > unclear to me what its purpose would be and it would be helpful in > > evaluating the question to have some kind of a sketch of it. What > > interface it would introduce, how it would be used etc. > > > > Yep. Very reasonable questions to ask. > > Mind -it's not hashed out - it's not a "proposal" yet, just "discussion" - > mostly to find out whether this one raises someone's eyebrows in terms of > "yeah, we would love to have it" or "meh - not worth getting into > details". So I definitely do not have many answers. > > But roughly speaking. Ibis defines python API for data processing (that's > simplifying things of course) to manipulate data - mapping the > Pythonic/Dataframe interface to underlying DB engines. You can essentially > use the same code to access data in local in-memory DuckDB in dev and > Bigquery in PROD. > > g = t.group_by(["species", > "island"]).agg(count=t.count()).order_by("count") > > As mentioned before - initially (similarly to common.io) - Airflow > connection id's could be used to instantiate Ibis connection > > instead of > > con = ibis.connect("duckdb://") > > it would be (for example - it likely could be better): > > con = common.dataframe.Dataframe.get(conn_id) > > So nothing really fancy. No new API to define, just a glue to existing > Airflow connections. > > Eventually (and as Gil mentioned that might be future) - thanks to unified > Ibis API, it could mean that lineage information is extracted automatically > at the level of the Ibis API, rather than having to implement it separately > for each of those engines that are supported by Ibis (and future ones). > > I hope that's enough to hear if that seems like something that is of > interest :). > > J. >