fascinating - i was just visiting the list to discuss this very topic having just got started building the very same provider. :)
Ahmad, thanks for your work jere. It's about inline with what I was planning. A bit about me, I'm one of the earliest contributors to TinkerPop, a current committer and its first PMC Chair (I've stepped away from PMC duties in recent years). I think an Airflow integration for TinkerPop would be quite useful to folks and have come across requests for it so the need would appear to be there. To build on what Ahmad wrote in relation to a "common.graph" proposal, the graph system world isn't terribly aligned on standards. There are many takes on what constitutes a great query language and the protocols that support that. As a result, not every graph system out there reaches for TinkerPop and Gremlin. While there is a push to a standard graph query language in GQL[1] I think it will take many years to see a full adoption and even then, languages like Gremlin will have a place in graphs for the style and functional gaps that they fill. That said, in the long term, I think that TinkerPop has always taken a position that we care about enabling usage of whatever graph database, graph query language, etc, that you like and have looked to open ways to provide connectivity there. For example I recently did a Twitch stream with the creator of a GQL to Gremlin compiler[2] which provides an interesting first thought on how we might work with GQL in the future. In this respect, I think having TinkerPop as a core of a "common.graph" could make some sense, but without Neo4j support (they stopped supporting the libraries we needed to keep up with their upgrades a number of years ago) and a few others, i think it's hard to claim that titling. I'm only just learning Airflow and how things are organized, but perhaps "common.graph" could be an amalgamation of the different graph "things" out there somehow? I will take a look at the PR in greater detail later today to see if I can provide any comments. Very exciting to see this happening! Thanks! [1] https://www.gqlstandards.org/ [2] https://www.youtube.com/watch?v=Kd0i-ieni6M On 2025/02/24 11:33:31 Ahmad Farhan wrote: > I have worked with two different graph database vendors—Azure Cosmos DB and > Neo4j. During our migration to Neo4j, we discovered that using the Gremlin > language wasn’t possible; we were forced to rewrite all our queries into > Cypher, which is the native language for Neo4j and, in my experience, much > simpler for querying. > > This situation highlights a key challenge for a common abstraction: the > underlying query languages and connection/authentication mechanisms vary > significantly. Gremlin is not only different from Cypher in syntax but is > also deprecated for Neo4j (see > https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin). > > The question would be how can the common approach accommodate these > different query languages? > > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Without deep looking at the code I love the idea - it's very similar to > > what we have for common.sql and common.io - and soon common.messaging - I > > also - long time ago - suggested common.dataframe that someone could submit > > using Apache Ibis: > > https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l - > > similarly I believe there was an idea about common.llm ... > > > > I think the "common" pattern is a great one for Airflow, to build on top of > > "other giants" who build those common abstractions that you can easily > > switch between different implementations of various data access layers. > > > > My suggestion and question - would be however (not very strong on it, I > > would love to hear what others think, I know it's been somewhat contentious > > when I started the ibis discussion) - would be to make it "common.graph", > > "common.dataframe" - instead of "apache.gremlin" or "apache.ibis" - just to > > stress that those are not implementations of particular service but > > opinionated choice of particular technology to do "common" operations. This > > is what essentially "common.io" is . - it should be named "fsspec" > > provider > > if we were to name it by the "library" that implemented it. > > > > J. > > > > > > On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <ah...@gmail.com> > > wrote: > > > > > Hi Everyone, > > > > > > I’ve created a draft PR (https://github.com/apache/airflow/pull/46977) > > to > > > introduce and discuss a new provider for using Gremlin—the graph > > traversal > > > language of Apache TinkerPop (more details here: > > > https://tinkerpop.apache.org/gremlin.html). Gremlin is supported by > > > various > > > graph database vendors such as Azure Cosmos DB and Amazon Neptune. > > > Previously, I had to develop a custom hook to query data from Azure > > Cosmos > > > DB using Apache Gremlin. > > > > > > I managed to create a provider and run it locally on the main branch. > > > However, I ran into the BaseHook issue ( > > > https://github.com/apache/airflow/issues/45233) on that branch, so I > > ended > > > up testing it fully on the v2-10-test branch. The PR should be complete, > > > but I’ve kept it as a draft for now while we discuss the provider. > > > > > > I’m a new contributor, so I’m especially eager to hear your feedback. > > > Comments on the PR is very welcome, and please feel free to reach out > > with > > > any questions via email or Slack. > > > > > > Thanks, > > > Ahmad Farhan > > > > > >