fascinating - i was just visiting the list to discuss this very topic
having just got started building the very same provider. :)

Ahmad, thanks for your work jere. It's about inline with what I was
planning.

A bit about me, I'm one of the earliest contributors to TinkerPop, a
current committer and its first PMC Chair (I've stepped away from PMC
duties in recent years). I think an Airflow integration for TinkerPop would
be quite useful to folks and have come across requests for it so the need
would appear to be there.

To build on what Ahmad wrote in relation to a "common.graph" proposal, the
graph system world isn't terribly aligned on standards. There are many
takes on what constitutes a great query language and the protocols that
support that. As a result, not every graph system out there reaches for
TinkerPop and Gremlin. While there is a push to a standard graph query
language in GQL[1] I think it will take many years to see a full
adoption and even then, languages like Gremlin will have a place in graphs
for the style and functional gaps that they fill. That said, in the long
term, I think that TinkerPop has always taken a position that we care about
enabling usage of whatever graph database, graph query language, etc, that
you like and have looked to open ways to provide connectivity there. For
example I recently did a Twitch stream with the creator of a GQL to Gremlin
compiler[2] which provides an interesting first thought on how we might
work with GQL in the future. In this respect, I think having TinkerPop as a
core of a "common.graph" could make some sense, but without Neo4j support
(they stopped supporting the libraries we needed to keep up with their
upgrades a number of years ago) and a few others, i think it's hard to
claim that titling.

I'm only just learning Airflow and how things are organized, but perhaps
"common.graph" could be an amalgamation of the different graph "things" out
there somehow?

I will take a look at the PR in greater detail later today to see if I can
provide any comments. Very exciting to see this happening! Thanks!

[1] https://www.gqlstandards.org/
[2] https://www.youtube.com/watch?v=Kd0i-ieni6M

On 2025/02/24 11:33:31 Ahmad Farhan wrote:
> I have worked with two different graph database vendors—Azure Cosmos DB
and
> Neo4j. During our migration to Neo4j, we discovered that using the Gremlin
> language wasn’t possible; we were forced to rewrite all our queries into
> Cypher, which is the native language for Neo4j and, in my experience, much
> simpler for querying.
>
> This situation highlights a key challenge for a common abstraction: the
> underlying query languages and connection/authentication mechanisms vary
> significantly. Gremlin is not only different from Cypher in syntax but is
> also deprecated for Neo4j (see
> https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin).
>
> The question would be how can the common approach accommodate these
> different query languages?
>
> On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Without deep looking at the code I love the idea - it's very similar to
> > what we have for common.sql and common.io - and soon common.messaging -
I
> > also - long time ago - suggested common.dataframe that someone could
submit
> > using Apache Ibis:
> > https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l  -
> > similarly I believe there was an idea about common.llm ...
> >
> > I think the "common" pattern is a great one for Airflow, to build on
top of
> > "other giants" who build those common abstractions that you can easily
> > switch between different implementations of various data access layers.
> >
> > My suggestion and question - would be however (not very strong on it, I
> > would love to hear what others think, I know it's been somewhat
contentious
> > when I started the ibis discussion) - would be to make it
"common.graph",
> > "common.dataframe" - instead of "apache.gremlin" or "apache.ibis" -
just to
> > stress that those are not implementations of particular service but
> > opinionated choice of particular technology to do "common" operations.
This
> > is what essentially "common.io" is . - it should be named "fsspec"
> > provider
> > if we were to name it by the "library" that implemented it.
> >
> > J.
> >
> >
> > On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <ah...@gmail.com>
> > wrote:
> >
> > > Hi Everyone,
> > >
> > > I’ve created a draft PR (https://github.com/apache/airflow/pull/46977)
> > to
> > > introduce and discuss a new provider for using Gremlin—the graph
> > traversal
> > > language of Apache TinkerPop (more details here:
> > > https://tinkerpop.apache.org/gremlin.html). Gremlin is supported by
> > > various
> > > graph database vendors such as Azure Cosmos DB and Amazon Neptune.
> > > Previously, I had to develop a custom hook to query data from Azure
> > Cosmos
> > > DB using Apache Gremlin.
> > >
> > > I managed to create a provider and run it locally on the main branch.
> > > However, I ran into the BaseHook issue (
> > > https://github.com/apache/airflow/issues/45233) on that branch, so I
> > ended
> > > up testing it fully on the v2-10-test branch. The PR should be
complete,
> > > but I’ve kept it as a draft for now while we discuss the provider.
> > >
> > > I’m a new contributor, so I’m especially eager to hear your feedback.
> > > Comments on the PR is very welcome, and please feel free to reach out
> > with
> > > any questions via email or Slack.
> > >
> > > Thanks,
> > > Ahmad Farhan
> > >
> >
>

Reply via email to