Very interested in this.

I am quite positive and supportive of adding support for a generic
dataframe abstraction within Airflow.
However, I do have a few questions around how and where to include this
within Airflow from a dependency perspective.

I do wonder if this needs to be in Core Airflow or can be something outside
of Core Airflow as an Operator building block.



On Tue, Jun 25, 2024 at 12:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:

>  Yeah I'm not saying there shouldn't be an airflow library.  It's just
>
> > unclear to me what its purpose would be and it would be helpful in
> > evaluating the question to have some kind of a sketch of it.  What
> > interface it would introduce, how it would be used etc.
> >
>
> Yep. Very reasonable questions to ask.
>
> Mind -it's not hashed out - it's not a "proposal" yet, just "discussion" -
> mostly to find out whether this one raises someone's eyebrows in terms of
> "yeah, we would love to have it" or "meh - not worth getting into
> details".  So I definitely do not have many answers.
>
> But roughly speaking. Ibis defines python API for data processing (that's
> simplifying things of course) to manipulate data - mapping the
> Pythonic/Dataframe interface to underlying DB engines. You can essentially
> use the same code to access data in local in-memory DuckDB in dev and
> Bigquery in PROD.
>
> g = t.group_by(["species",
> "island"]).agg(count=t.count()).order_by("count")
>
> As mentioned before - initially (similarly to common.io) - Airflow
> connection id's could be used to instantiate Ibis connection
>
> instead of
>
> con = ibis.connect("duckdb://")
>
> it would be (for example - it likely could be better):
>
> con = common.dataframe.Dataframe.get(conn_id)
>
> So nothing really fancy. No new API to define, just a glue to existing
> Airflow connections.
>
> Eventually (and as Gil mentioned that might be future) - thanks to unified
> Ibis API, it could mean that lineage information is extracted automatically
> at the level of the Ibis API, rather than having to implement it separately
> for each of those engines that are supported by Ibis (and future ones).
>
> I hope that's enough to hear if that seems like something that is of
> interest :).
>
> J.
>

Reply via email to