Hi everyone, I just wanted to check in on the status of the TinkerPop Provider PR:
https://github.com/apache/airflow/pull/47446 I don't know if there is any outstanding work left here but I was curious if there was any sense from committers as to when this body of work will be merged and made available in a release? Thanks, Stephen On Sat, Mar 8, 2025 at 2:57 PM Paul King <pa...@apache.org> wrote: > > Nice to see the Apache TinkerPop/Gremlin support. As well as using Gremlin > with TinkerPop's TinkerGraph, I have also used it with OrientDB, ArcadeDB, > and Apache HugeGraph. If anyone is interested, I did a blog post here: > https://groovy.apache.org/blog/groovy-graph-databases > (Sorry but it's Groovy not Python) > > Cheers, Paul. > > On 2025/03/08 11:34:11 Ahmad Farhan wrote: > > Hi! > > I managed to do the last cleanup on the PR, and is ready for thorough > > review. I will create a `[LAZY CONSENSUS]` thread on Monday/Tuesday. > Feel > > free to review and comment on the PR in the meantime. > > > > Farhan > > > > On Thu, Mar 6, 2025 at 1:40 PM Stephen Mallette <spmalle...@gmail.com> > > wrote: > > > > > Hi Ahmad, thanks for the updates! I've spent some time looking at the > > > changes and have added some comments and questions. I've pointed out a > > > couple things for future work that I think will be important, but > nothing > > > that needs to be changed for this PR in my mind. Looking forward to > seeing > > > your responses, other community feedback, and ultimately a merge of the > > > provider - take care! > > > > > > On Thu, Mar 6, 2025 at 7:28 AM Ahmad Farhan < > ahmad.farhan9...@gmail.com> > > > wrote: > > > > > > > Hello again! > > > > After several days of debugging and branch restructuring I managed to > > > > create another PR https://github.com/apache/airflow/pull/47446, > this is > > > > due > > > > to major merge conflicts after the www clean up. > > > > The renaming of the provider to apache-tinkerpop is done but I kept > the > > > > integration testing as 'gremlin' to avoid the conflict/confusion > with the > > > > server name within the CI. > > > > > > > > Please review the PR thoroughly and I am happy to work on the > following > > > > steps. > > > > > > > > Thanks, > > > > Farhan > > > > > > > > On Wed, Feb 26, 2025 at 7:43 PM Ahmad Farhan < > ahmad.farhan9...@gmail.com > > > > > > > > wrote: > > > > > > > > > I did look into the naming and I thought that it would need to be > > > > > discussed at some point before the lazy consensus stage after the > dev > > > > work > > > > > is done but I guess I was wrong :) > > > > > > > > > > I read through some docs regarding the naming and I kept thinking > that > > > > > Apache Gremlin might not be right, so I decided to remove 'Apache' > from > > > > all > > > > > doc strings. One thing that popped in one of the documentations > from > > > > > Microsoft that says Apache Gremlin (here > > > > > > https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/introduction > > > ) > > > > > which confused me. > > > > > > > > > > I will change the folder to apache/tinkerpop. Thanks Stephen for > the > > > > > in-depth explanation. > > > > > > > > > > On Wed, Feb 26, 2025 at 6:56 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > > > > > > > > > >> Cool. I will let Ahmad comment, but I think we found the > **someone** > > > who > > > > >> will help in case there are some future issues with the > > > > Tinkerpop/Gremlin > > > > >> provider. > > > > >> While I like Gremlin better (it's just a cool name and I like the > > > logo, > > > > >> tinkerpop has a cool logo as well > > > > https://tinkerpop.apache.org/index.html > > > > >> ). > > > > >> > > > > >> So as long as we decide not to use common.graph -> I am fine with > both > > > > :) > > > > >> > > > > >> j. > > > > >> > > > > >> > > > > >> On Wed, Feb 26, 2025 at 7:42 PM Stephen Mallette < > > > spmalle...@gmail.com> > > > > >> wrote: > > > > >> > > > > >> > On Wed, Feb 26, 2025 at 12:57 PM Jarek Potiuk <ja...@potiuk.com > > > > > > wrote: > > > > >> > > > > > >> > > > In the interest of ASF trademarks, I would suggest it be > called > > > > >> > > "apache/tinkerpop" with "Gremlin" naming reserved for > operators > > > and > > > > >> the > > > > >> > > like, as it is now with GremlinOperator. I think this makes > sense > > > > >> because > > > > >> > > it is connecting to TinkerPop-enabled systems via Gremlin. I > would > > > > >> > > similarly suggest that references to "Apache Gremlin" and the > like > > > > >> become > > > > >> > > "Apache TinkerPop". > > > > >> > > > > > > >> > > That's an interesting one - indeed TinkerPop is the PMC/ > > > Framework - > > > > >> > > Gremlin is the language. > > > > >> > > > > > > >> > > I am not sure we are actually using TinkerPop here - because > > > > >> TinkerPop is > > > > >> > > the whole framework - Ahmad, can you explain the relation > there - > > > > are > > > > >> > those > > > > >> > > other systems simply implement Gremlin as language or do they > use > > > > >> > TinkerPop > > > > >> > > for something / as a backend? > > > > >> > > > > > > >> > > > > > >> > I'm sure Ahmad could answer but I'll quickly offer my take. I > think > > > > >> that in > > > > >> > this case we should prefer "TinkerPop" over Gremlin as a > top-level > > > > name > > > > >> > particularly because it's prefixed with "Apache" and there is no > > > > "Apache > > > > >> > Gremlin" which I tend to think is confusing when the words are > that > > > > >> close > > > > >> > together. I can't recall over the years just how many times I've > > > asked > > > > >> for > > > > >> > corrections in blog posts. :) > > > > >> > > > > > >> > Because that's a bit of a conceptual difference here. For > example in > > > > the > > > > >> > > provider we are importing > https://pypi.org/project/gremlinpython > > > > not > > > > >> > > "tinkerpop" - and it also does not have tinkerpop as > dependency. > > > > >> > > > > > > >> > > > > > >> > A bit of history goes along with a lot of our naming for what we > > > term > > > > >> > Gremlin Language Variants (GLVs), like gremlinpython, which are > > > > >> variants of > > > > >> > Gremlin natively implemented to allow users to express Gremlin > in > > > the > > > > >> > idioms of their own language. They also provide driver > connectivity > > > to > > > > >> > compatible servers. TinkerPop has mostly inherited all of its > > > language > > > > >> > variants, including gremlinpython which was the first, from > > > > third-party > > > > >> > community developers. As a project, we didn't really get a hand > in > > > the > > > > >> > naming so with those projects already in heavy use we just > kinda of > > > > >> stuck > > > > >> > to it and even doubled-down (like when we built gremlin-go > within > > > the > > > > >> ASF). > > > > >> > > > > > >> > I think in this case, your project organization under "apache" > seems > > > > to > > > > >> > almost lend itself nicely to apache/tinkerpop. i think users > will > > > > >> recognize > > > > >> > it as equally as they recognize Gremlin. > > > > >> > > > > > >> > > > > > >> > > > > > > >> > > I wonder if Gremlin is also a Trademark by Apache ? Maybe we > > > should > > > > >> ask > > > > >> > > tinkerpop PMC what they think about it? > > > > >> > > > > > > >> > > > > > >> > Gremlin is not an ASF trademark. That was debated for quite a > long > > > > time > > > > >> > with trademarks@ along with deciding if Gremlin, the character > and > > > > his > > > > >> > friends[1], were to be protected. In the end, for reasons I'm > not > > > > sure I > > > > >> > quite remember, the ASF didn't think it was necessary. > > > > >> > > > > > >> > Anyway, I'm not sure if you noted my earlier post[2] but I'm > one of > > > > the > > > > >> > original contributors to Apache TinkerPop, even before we > brought it > > > > to > > > > >> the > > > > >> > ASF so I'm pretty familiar with our project. :) > > > > >> > > > > > >> > [1] > > > > >> > > > > > >> > > > > > >> > > > > > > > > https://github.com/apache/tinkerpop/blob/master/docs/static/images/tinkerpop3-splash.png > > > > >> > [2] > > > https://lists.apache.org/thread/9hf4t8hyk944fyo4q3nygczyo5xhk18y > > > > >> > > > > > >> > > > > > >> > > > > > > >> > > > > > > >> > > J. > > > > >> > > > > > > >> > > > > > > >> > > On Wed, Feb 26, 2025 at 4:55 PM Stephen Mallette < > > > > >> spmalle...@apache.org> > > > > >> > > wrote: > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > >> > > > On 2025/02/26 12:38:02 Jarek Potiuk wrote: > > > > >> > > > > Yeah . `apache/gremlin" seems like a better option then. > Does > > > > >> anyone > > > > >> > > have > > > > >> > > > > anything against it? > > > > >> > > > > > > > >> > > > In the interest of ASF trademarks, I would suggest it be > called > > > > >> > > > "apache/tinkerpop" with "Gremlin" naming reserved for > operators > > > > and > > > > >> the > > > > >> > > > like, as it is now with GremlinOperator. I think this makes > > > sense > > > > >> > because > > > > >> > > > it is connecting to TinkerPop-enabled systems via Gremlin. I > > > would > > > > >> > > > similarly suggest that references to "Apache Gremlin" and > the > > > like > > > > >> > become > > > > >> > > > "Apache TinkerPop". > > > > >> > > > > > > > >> > > > > I think we are pretty happy with accepting "other > > > > >> > > > > apache" projects as providers, so I see no issue with > Gremlin > > > - > > > > >> > knowing > > > > >> > > > > that we can always reach out to our friendly Apache > Community > > > in > > > > >> case > > > > >> > > of > > > > >> > > > > any issues. So - unless we do not hear any "opposition" > in a > > > few > > > > >> > days, > > > > >> > > I > > > > >> > > > > think it would make sense if you start `[LAZY CONSENSUS]` > > > > thread - > > > > >> > > > > without a need for `[VOTE]` thread. > > > > >> > > > > > > > > >> > > > > One thing though that I would love to have - is to also > have > > > an > > > > >> > > > integration > > > > >> > > > > test if possible (we had it with apache.kafka for > example) - > > > > those > > > > >> > are > > > > >> > > > > tests that could run **some** graphdb database locally > (via > > > > >> > > > docker-compose) > > > > >> > > > > and run a very rudimentary checks against a "real" > database, > > > > not a > > > > >> > > mocked > > > > >> > > > > call. That would make it more robust. > > > > >> > > > > > > > > >> > > > > More about integration tests, how to build, run, test > them and > > > > >> > > integrate > > > > >> > > > > them in our CI can be found here: > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst > > > > >> > > > > - happy to help if you are stuck with it. > > > > >> > > > > > > > > >> > > > > J. > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > On Wed, Feb 26, 2025 at 1:25 PM Ahmad Farhan < > > > > >> > > ahmad.farhan9...@gmail.com > > > > >> > > > > > > > > >> > > > > wrote: > > > > >> > > > > > > > > >> > > > > > I pushed changes to move the provider into the “apache” > > > > >> directory. > > > > >> > > > After > > > > >> > > > > > updating the class references across the project, I > > > re-tested > > > > >> and > > > > >> > all > > > > >> > > > tests > > > > >> > > > > > passed. > > > > >> > > > > > > > > > >> > > > > > Regarding the use of Gremlin (or another graph query > > > language > > > > >> like > > > > >> > > > Cypher > > > > >> > > > > > and SPARQL) for a common package approach, here are my > > > > thoughts > > > > >> on > > > > >> > > the > > > > >> > > > pros > > > > >> > > > > > and cons: > > > > >> > > > > > > > > > >> > > > > > pros (I can see only one): > > > > >> > > > > > > > > > >> > > > > > - Gremlin has been widely adopted by different cloud > > > > vendors > > > > >> > (e.g. > > > > >> > > > Azure > > > > >> > > > > > Cosmos DB with Apache Gremlin and AWS Neptune) as > well as > > > > in > > > > >> > > > self-hosted > > > > >> > > > > > environments. > > > > >> > > > > > > > > > >> > > > > > cons: > > > > >> > > > > > > > > > >> > > > > > - Gremlin, Cypher (native for Neo4j) and SPARQL each > have > > > > >> their > > > > >> > > own > > > > >> > > > > > drivers for executing queries. > > > > >> > > > > > - To achieve a common abstraction, a wrapper around > each > > > > >> driver > > > > >> > > > would be > > > > >> > > > > > required. Each driver has its own connection > parameters, > > > > >> > > underlying > > > > >> > > > > > protocols, and may need method overrides for > > > compatibility > > > > >> with > > > > >> > > > > > different > > > > >> > > > > > Python versions. > > > > >> > > > > > - Not all vendors support every query language; for > > > > instance, > > > > >> > > > Gremlin > > > > >> > > > > > for Neo4j has been deprecated in recent releases, > while > > > > >> Cosmos > > > > >> > DB > > > > >> > > > does > > > > >> > > > > > not > > > > >> > > > > > support Cypher or SPARQL. > > > > >> > > > > > > > > > >> > > > > > While it would be ideal to have a unified graph query > > > language > > > > >> and > > > > >> > > > driver > > > > >> > > > > > that works seamlessly across different vendors, such a > > > > solution > > > > >> > does > > > > >> > > > not > > > > >> > > > > > exist at the moment. In my opinion, implementing > > > > >> provider-specific > > > > >> > > > > > solutions for each query language (Gremlin, Cypher, > SPARQL) > > > is > > > > >> more > > > > >> > > > > > realistic and practical given the current landscape. > > > > >> > > > > > > > > > >> > > > > > Happy to discuss further or answer any questions! > > > > >> > > > > > > > > > >> > > > > > Farhan > > > > >> > > > > > > > > > >> > > > > > On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan < > > > > >> > > > ahmad.farhan9...@gmail.com> > > > > >> > > > > > wrote: > > > > >> > > > > > > > > > >> > > > > > > I have worked with two different graph database > > > > vendors—Azure > > > > >> > > Cosmos > > > > >> > > > DB > > > > >> > > > > > > and Neo4j. During our migration to Neo4j, we > discovered > > > that > > > > >> > using > > > > >> > > > the > > > > >> > > > > > > Gremlin language wasn’t possible; we were forced to > > > rewrite > > > > >> all > > > > >> > our > > > > >> > > > > > queries > > > > >> > > > > > > into Cypher, which is the native language for Neo4j > and, > > > in > > > > my > > > > >> > > > > > experience, > > > > >> > > > > > > much simpler for querying. > > > > >> > > > > > > > > > > >> > > > > > > This situation highlights a key challenge for a common > > > > >> > abstraction: > > > > >> > > > the > > > > >> > > > > > > underlying query languages and > connection/authentication > > > > >> > mechanisms > > > > >> > > > vary > > > > >> > > > > > > significantly. Gremlin is not only different from > Cypher > > > in > > > > >> > syntax > > > > >> > > > but is > > > > >> > > > > > > also deprecated for Neo4j (see > > > > >> > > > > > > > > > > >> https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin > > > > >> > ). > > > > >> > > > > > > > > > > >> > > > > > > The question would be how can the common approach > > > > accommodate > > > > >> > these > > > > >> > > > > > > different query languages? > > > > >> > > > > > > > > > > >> > > > > > > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk < > > > > >> ja...@potiuk.com> > > > > >> > > > wrote: > > > > >> > > > > > > > > > > >> > > > > > >> Without deep looking at the code I love the idea - > it's > > > > very > > > > >> > > > similar to > > > > >> > > > > > >> what we have for common.sql and common.io - and soon > > > > >> > > > common.messaging > > > > >> > > > > > - I > > > > >> > > > > > >> also - long time ago - suggested common.dataframe > that > > > > >> someone > > > > >> > > could > > > > >> > > > > > >> submit > > > > >> > > > > > >> using Apache Ibis: > > > > >> > > > > > >> > > > > >> > > https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l > > > > >> > > - > > > > >> > > > > > >> similarly I believe there was an idea about > common.llm > > > ... > > > > >> > > > > > >> > > > > >> > > > > > >> I think the "common" pattern is a great one for > Airflow, > > > to > > > > >> > build > > > > >> > > > on top > > > > >> > > > > > >> of > > > > >> > > > > > >> "other giants" who build those common abstractions > that > > > you > > > > >> can > > > > >> > > > easily > > > > >> > > > > > >> switch between different implementations of various > data > > > > >> access > > > > >> > > > layers. > > > > >> > > > > > >> > > > > >> > > > > > >> My suggestion and question - would be however (not > very > > > > >> strong > > > > >> > on > > > > >> > > > it, I > > > > >> > > > > > >> would love to hear what others think, I know it's > been > > > > >> somewhat > > > > >> > > > > > >> contentious > > > > >> > > > > > >> when I started the ibis discussion) - would be to > make it > > > > >> > > > > > "common.graph", > > > > >> > > > > > >> "common.dataframe" - instead of "apache.gremlin" or > > > > >> > "apache.ibis" > > > > >> > > - > > > > >> > > > just > > > > >> > > > > > >> to > > > > >> > > > > > >> stress that those are not implementations of > particular > > > > >> service > > > > >> > > but > > > > >> > > > > > >> opinionated choice of particular technology to do > > > "common" > > > > >> > > > operations. > > > > >> > > > > > >> This > > > > >> > > > > > >> is what essentially "common.io" is . - it should be > > > named > > > > >> > > "fsspec" > > > > >> > > > > > >> provider > > > > >> > > > > > >> if we were to name it by the "library" that > implemented > > > it. > > > > >> > > > > > >> > > > > >> > > > > > >> J. > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan < > > > > >> > > > > > ahmad.farhan9...@gmail.com> > > > > >> > > > > > >> wrote: > > > > >> > > > > > >> > > > > >> > > > > > >> > Hi Everyone, > > > > >> > > > > > >> > > > > > >> > > > > > >> > I’ve created a draft PR ( > > > > >> > > > https://github.com/apache/airflow/pull/46977 > > > > >> > > > > > ) > > > > >> > > > > > >> to > > > > >> > > > > > >> > introduce and discuss a new provider for using > > > > Gremlin—the > > > > >> > graph > > > > >> > > > > > >> traversal > > > > >> > > > > > >> > language of Apache TinkerPop (more details here: > > > > >> > > > > > >> > https://tinkerpop.apache.org/gremlin.html). > Gremlin is > > > > >> > > supported > > > > >> > > > by > > > > >> > > > > > >> > various > > > > >> > > > > > >> > graph database vendors such as Azure Cosmos DB and > > > Amazon > > > > >> > > Neptune. > > > > >> > > > > > >> > Previously, I had to develop a custom hook to query > > > data > > > > >> from > > > > >> > > > Azure > > > > >> > > > > > >> Cosmos > > > > >> > > > > > >> > DB using Apache Gremlin. > > > > >> > > > > > >> > > > > > >> > > > > > >> > I managed to create a provider and run it locally > on > > > the > > > > >> main > > > > >> > > > branch. > > > > >> > > > > > >> > However, I ran into the BaseHook issue ( > > > > >> > > > > > >> > https://github.com/apache/airflow/issues/45233) on > > > that > > > > >> > branch, > > > > >> > > > so I > > > > >> > > > > > >> ended > > > > >> > > > > > >> > up testing it fully on the v2-10-test branch. The > PR > > > > >> should be > > > > >> > > > > > complete, > > > > >> > > > > > >> > but I’ve kept it as a draft for now while we > discuss > > > the > > > > >> > > provider. > > > > >> > > > > > >> > > > > > >> > > > > > >> > I’m a new contributor, so I’m especially eager to > hear > > > > your > > > > >> > > > feedback. > > > > >> > > > > > >> > Comments on the PR is very welcome, and please feel > > > free > > > > to > > > > >> > > reach > > > > >> > > > out > > > > >> > > > > > >> with > > > > >> > > > > > >> > any questions via email or Slack. > > > > >> > > > > > >> > > > > > >> > > > > > >> > Thanks, > > > > >> > > > > > >> > Ahmad Farhan > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > --------------------------------------------------------------------- > > > > >> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > >> > > > For additional commands, e-mail: > dev-h...@airflow.apache.org > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >