Hello again! After several days of debugging and branch restructuring I managed to create another PR https://github.com/apache/airflow/pull/47446, this is due to major merge conflicts after the www clean up. The renaming of the provider to apache-tinkerpop is done but I kept the integration testing as 'gremlin' to avoid the conflict/confusion with the server name within the CI.
Please review the PR thoroughly and I am happy to work on the following steps. Thanks, Farhan On Wed, Feb 26, 2025 at 7:43 PM Ahmad Farhan <ahmad.farhan9...@gmail.com> wrote: > I did look into the naming and I thought that it would need to be > discussed at some point before the lazy consensus stage after the dev work > is done but I guess I was wrong :) > > I read through some docs regarding the naming and I kept thinking that > Apache Gremlin might not be right, so I decided to remove 'Apache' from all > doc strings. One thing that popped in one of the documentations from > Microsoft that says Apache Gremlin (here > https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/introduction) > which confused me. > > I will change the folder to apache/tinkerpop. Thanks Stephen for the > in-depth explanation. > > On Wed, Feb 26, 2025 at 6:56 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Cool. I will let Ahmad comment, but I think we found the **someone** who >> will help in case there are some future issues with the Tinkerpop/Gremlin >> provider. >> While I like Gremlin better (it's just a cool name and I like the logo, >> tinkerpop has a cool logo as well https://tinkerpop.apache.org/index.html >> ). >> >> So as long as we decide not to use common.graph -> I am fine with both :) >> >> j. >> >> >> On Wed, Feb 26, 2025 at 7:42 PM Stephen Mallette <spmalle...@gmail.com> >> wrote: >> >> > On Wed, Feb 26, 2025 at 12:57 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> > >> > > > In the interest of ASF trademarks, I would suggest it be called >> > > "apache/tinkerpop" with "Gremlin" naming reserved for operators and >> the >> > > like, as it is now with GremlinOperator. I think this makes sense >> because >> > > it is connecting to TinkerPop-enabled systems via Gremlin. I would >> > > similarly suggest that references to "Apache Gremlin" and the like >> become >> > > "Apache TinkerPop". >> > > >> > > That's an interesting one - indeed TinkerPop is the PMC/ Framework - >> > > Gremlin is the language. >> > > >> > > I am not sure we are actually using TinkerPop here - because >> TinkerPop is >> > > the whole framework - Ahmad, can you explain the relation there - are >> > those >> > > other systems simply implement Gremlin as language or do they use >> > TinkerPop >> > > for something / as a backend? >> > > >> > >> > I'm sure Ahmad could answer but I'll quickly offer my take. I think >> that in >> > this case we should prefer "TinkerPop" over Gremlin as a top-level name >> > particularly because it's prefixed with "Apache" and there is no "Apache >> > Gremlin" which I tend to think is confusing when the words are that >> close >> > together. I can't recall over the years just how many times I've asked >> for >> > corrections in blog posts. :) >> > >> > Because that's a bit of a conceptual difference here. For example in the >> > > provider we are importing https://pypi.org/project/gremlinpython not >> > > "tinkerpop" - and it also does not have tinkerpop as dependency. >> > > >> > >> > A bit of history goes along with a lot of our naming for what we term >> > Gremlin Language Variants (GLVs), like gremlinpython, which are >> variants of >> > Gremlin natively implemented to allow users to express Gremlin in the >> > idioms of their own language. They also provide driver connectivity to >> > compatible servers. TinkerPop has mostly inherited all of its language >> > variants, including gremlinpython which was the first, from third-party >> > community developers. As a project, we didn't really get a hand in the >> > naming so with those projects already in heavy use we just kinda of >> stuck >> > to it and even doubled-down (like when we built gremlin-go within the >> ASF). >> > >> > I think in this case, your project organization under "apache" seems to >> > almost lend itself nicely to apache/tinkerpop. i think users will >> recognize >> > it as equally as they recognize Gremlin. >> > >> > >> > > >> > > I wonder if Gremlin is also a Trademark by Apache ? Maybe we should >> ask >> > > tinkerpop PMC what they think about it? >> > > >> > >> > Gremlin is not an ASF trademark. That was debated for quite a long time >> > with trademarks@ along with deciding if Gremlin, the character and his >> > friends[1], were to be protected. In the end, for reasons I'm not sure I >> > quite remember, the ASF didn't think it was necessary. >> > >> > Anyway, I'm not sure if you noted my earlier post[2] but I'm one of the >> > original contributors to Apache TinkerPop, even before we brought it to >> the >> > ASF so I'm pretty familiar with our project. :) >> > >> > [1] >> > >> > >> https://github.com/apache/tinkerpop/blob/master/docs/static/images/tinkerpop3-splash.png >> > [2] https://lists.apache.org/thread/9hf4t8hyk944fyo4q3nygczyo5xhk18y >> > >> > >> > > >> > > >> > > J. >> > > >> > > >> > > On Wed, Feb 26, 2025 at 4:55 PM Stephen Mallette < >> spmalle...@apache.org> >> > > wrote: >> > > >> > > > >> > > > >> > > > On 2025/02/26 12:38:02 Jarek Potiuk wrote: >> > > > > Yeah . `apache/gremlin" seems like a better option then. Does >> anyone >> > > have >> > > > > anything against it? >> > > > >> > > > In the interest of ASF trademarks, I would suggest it be called >> > > > "apache/tinkerpop" with "Gremlin" naming reserved for operators and >> the >> > > > like, as it is now with GremlinOperator. I think this makes sense >> > because >> > > > it is connecting to TinkerPop-enabled systems via Gremlin. I would >> > > > similarly suggest that references to "Apache Gremlin" and the like >> > become >> > > > "Apache TinkerPop". >> > > > >> > > > > I think we are pretty happy with accepting "other >> > > > > apache" projects as providers, so I see no issue with Gremlin - >> > knowing >> > > > > that we can always reach out to our friendly Apache Community in >> case >> > > of >> > > > > any issues. So - unless we do not hear any "opposition" in a few >> > days, >> > > I >> > > > > think it would make sense if you start `[LAZY CONSENSUS]` thread - >> > > > > without a need for `[VOTE]` thread. >> > > > > >> > > > > One thing though that I would love to have - is to also have an >> > > > integration >> > > > > test if possible (we had it with apache.kafka for example) - those >> > are >> > > > > tests that could run **some** graphdb database locally (via >> > > > docker-compose) >> > > > > and run a very rudimentary checks against a "real" database, not a >> > > mocked >> > > > > call. That would make it more robust. >> > > > > >> > > > > More about integration tests, how to build, run, test them and >> > > integrate >> > > > > them in our CI can be found here: >> > > > > >> > > > >> > > >> > >> https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst >> > > > > - happy to help if you are stuck with it. >> > > > > >> > > > > J. >> > > > > >> > > > > >> > > > > On Wed, Feb 26, 2025 at 1:25 PM Ahmad Farhan < >> > > ahmad.farhan9...@gmail.com >> > > > > >> > > > > wrote: >> > > > > >> > > > > > I pushed changes to move the provider into the “apache” >> directory. >> > > > After >> > > > > > updating the class references across the project, I re-tested >> and >> > all >> > > > tests >> > > > > > passed. >> > > > > > >> > > > > > Regarding the use of Gremlin (or another graph query language >> like >> > > > Cypher >> > > > > > and SPARQL) for a common package approach, here are my thoughts >> on >> > > the >> > > > pros >> > > > > > and cons: >> > > > > > >> > > > > > pros (I can see only one): >> > > > > > >> > > > > > - Gremlin has been widely adopted by different cloud vendors >> > (e.g. >> > > > Azure >> > > > > > Cosmos DB with Apache Gremlin and AWS Neptune) as well as in >> > > > self-hosted >> > > > > > environments. >> > > > > > >> > > > > > cons: >> > > > > > >> > > > > > - Gremlin, Cypher (native for Neo4j) and SPARQL each have >> their >> > > own >> > > > > > drivers for executing queries. >> > > > > > - To achieve a common abstraction, a wrapper around each >> driver >> > > > would be >> > > > > > required. Each driver has its own connection parameters, >> > > underlying >> > > > > > protocols, and may need method overrides for compatibility >> with >> > > > > > different >> > > > > > Python versions. >> > > > > > - Not all vendors support every query language; for instance, >> > > > Gremlin >> > > > > > for Neo4j has been deprecated in recent releases, while >> Cosmos >> > DB >> > > > does >> > > > > > not >> > > > > > support Cypher or SPARQL. >> > > > > > >> > > > > > While it would be ideal to have a unified graph query language >> and >> > > > driver >> > > > > > that works seamlessly across different vendors, such a solution >> > does >> > > > not >> > > > > > exist at the moment. In my opinion, implementing >> provider-specific >> > > > > > solutions for each query language (Gremlin, Cypher, SPARQL) is >> more >> > > > > > realistic and practical given the current landscape. >> > > > > > >> > > > > > Happy to discuss further or answer any questions! >> > > > > > >> > > > > > Farhan >> > > > > > >> > > > > > On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan < >> > > > ahmad.farhan9...@gmail.com> >> > > > > > wrote: >> > > > > > >> > > > > > > I have worked with two different graph database vendors—Azure >> > > Cosmos >> > > > DB >> > > > > > > and Neo4j. During our migration to Neo4j, we discovered that >> > using >> > > > the >> > > > > > > Gremlin language wasn’t possible; we were forced to rewrite >> all >> > our >> > > > > > queries >> > > > > > > into Cypher, which is the native language for Neo4j and, in my >> > > > > > experience, >> > > > > > > much simpler for querying. >> > > > > > > >> > > > > > > This situation highlights a key challenge for a common >> > abstraction: >> > > > the >> > > > > > > underlying query languages and connection/authentication >> > mechanisms >> > > > vary >> > > > > > > significantly. Gremlin is not only different from Cypher in >> > syntax >> > > > but is >> > > > > > > also deprecated for Neo4j (see >> > > > > > > >> https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin >> > ). >> > > > > > > >> > > > > > > The question would be how can the common approach accommodate >> > these >> > > > > > > different query languages? >> > > > > > > >> > > > > > > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk < >> ja...@potiuk.com> >> > > > wrote: >> > > > > > > >> > > > > > >> Without deep looking at the code I love the idea - it's very >> > > > similar to >> > > > > > >> what we have for common.sql and common.io - and soon >> > > > common.messaging >> > > > > > - I >> > > > > > >> also - long time ago - suggested common.dataframe that >> someone >> > > could >> > > > > > >> submit >> > > > > > >> using Apache Ibis: >> > > > > > >> >> > https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l >> > > - >> > > > > > >> similarly I believe there was an idea about common.llm ... >> > > > > > >> >> > > > > > >> I think the "common" pattern is a great one for Airflow, to >> > build >> > > > on top >> > > > > > >> of >> > > > > > >> "other giants" who build those common abstractions that you >> can >> > > > easily >> > > > > > >> switch between different implementations of various data >> access >> > > > layers. >> > > > > > >> >> > > > > > >> My suggestion and question - would be however (not very >> strong >> > on >> > > > it, I >> > > > > > >> would love to hear what others think, I know it's been >> somewhat >> > > > > > >> contentious >> > > > > > >> when I started the ibis discussion) - would be to make it >> > > > > > "common.graph", >> > > > > > >> "common.dataframe" - instead of "apache.gremlin" or >> > "apache.ibis" >> > > - >> > > > just >> > > > > > >> to >> > > > > > >> stress that those are not implementations of particular >> service >> > > but >> > > > > > >> opinionated choice of particular technology to do "common" >> > > > operations. >> > > > > > >> This >> > > > > > >> is what essentially "common.io" is . - it should be named >> > > "fsspec" >> > > > > > >> provider >> > > > > > >> if we were to name it by the "library" that implemented it. >> > > > > > >> >> > > > > > >> J. >> > > > > > >> >> > > > > > >> >> > > > > > >> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan < >> > > > > > ahmad.farhan9...@gmail.com> >> > > > > > >> wrote: >> > > > > > >> >> > > > > > >> > Hi Everyone, >> > > > > > >> > >> > > > > > >> > I’ve created a draft PR ( >> > > > https://github.com/apache/airflow/pull/46977 >> > > > > > ) >> > > > > > >> to >> > > > > > >> > introduce and discuss a new provider for using Gremlin—the >> > graph >> > > > > > >> traversal >> > > > > > >> > language of Apache TinkerPop (more details here: >> > > > > > >> > https://tinkerpop.apache.org/gremlin.html). Gremlin is >> > > supported >> > > > by >> > > > > > >> > various >> > > > > > >> > graph database vendors such as Azure Cosmos DB and Amazon >> > > Neptune. >> > > > > > >> > Previously, I had to develop a custom hook to query data >> from >> > > > Azure >> > > > > > >> Cosmos >> > > > > > >> > DB using Apache Gremlin. >> > > > > > >> > >> > > > > > >> > I managed to create a provider and run it locally on the >> main >> > > > branch. >> > > > > > >> > However, I ran into the BaseHook issue ( >> > > > > > >> > https://github.com/apache/airflow/issues/45233) on that >> > branch, >> > > > so I >> > > > > > >> ended >> > > > > > >> > up testing it fully on the v2-10-test branch. The PR >> should be >> > > > > > complete, >> > > > > > >> > but I’ve kept it as a draft for now while we discuss the >> > > provider. >> > > > > > >> > >> > > > > > >> > I’m a new contributor, so I’m especially eager to hear your >> > > > feedback. >> > > > > > >> > Comments on the PR is very welcome, and please feel free to >> > > reach >> > > > out >> > > > > > >> with >> > > > > > >> > any questions via email or Slack. >> > > > > > >> > >> > > > > > >> > Thanks, >> > > > > > >> > Ahmad Farhan >> > > > > > >> > >> > > > > > >> >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> --------------------------------------------------------------------- >> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> > > > For additional commands, e-mail: dev-h...@airflow.apache.org >> > > > >> > > > >> > > >> > >> >