I was thinking integration testing might be something to consider for testing this provider, I will have a read through the link and implement that. Thanks Jarek
Farhan On Wed, Feb 26, 2025 at 12:40 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Yeah . `apache/gremlin" seems like a better option then. Does anyone have > anything against it? I think we are pretty happy with accepting "other > apache" projects as providers, so I see no issue with Gremlin - knowing > that we can always reach out to our friendly Apache Community in case of > any issues. So - unless we do not hear any "opposition" in a few days, I > think it would make sense if you start `[LAZY CONSENSUS]` thread - > without a need for `[VOTE]` thread. > > One thing though that I would love to have - is to also have an integration > test if possible (we had it with apache.kafka for example) - those are > tests that could run **some** graphdb database locally (via docker-compose) > and run a very rudimentary checks against a "real" database, not a mocked > call. That would make it more robust. > > More about integration tests, how to build, run, test them and integrate > them in our CI can be found here: > > https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst > - happy to help if you are stuck with it. > > J. > > > On Wed, Feb 26, 2025 at 1:25 PM Ahmad Farhan <ahmad.farhan9...@gmail.com> > wrote: > > > I pushed changes to move the provider into the “apache” directory. After > > updating the class references across the project, I re-tested and all > tests > > passed. > > > > Regarding the use of Gremlin (or another graph query language like Cypher > > and SPARQL) for a common package approach, here are my thoughts on the > pros > > and cons: > > > > pros (I can see only one): > > > > - Gremlin has been widely adopted by different cloud vendors (e.g. > Azure > > Cosmos DB with Apache Gremlin and AWS Neptune) as well as in > self-hosted > > environments. > > > > cons: > > > > - Gremlin, Cypher (native for Neo4j) and SPARQL each have their own > > drivers for executing queries. > > - To achieve a common abstraction, a wrapper around each driver would > be > > required. Each driver has its own connection parameters, underlying > > protocols, and may need method overrides for compatibility with > > different > > Python versions. > > - Not all vendors support every query language; for instance, Gremlin > > for Neo4j has been deprecated in recent releases, while Cosmos DB does > > not > > support Cypher or SPARQL. > > > > While it would be ideal to have a unified graph query language and driver > > that works seamlessly across different vendors, such a solution does not > > exist at the moment. In my opinion, implementing provider-specific > > solutions for each query language (Gremlin, Cypher, SPARQL) is more > > realistic and practical given the current landscape. > > > > Happy to discuss further or answer any questions! > > > > Farhan > > > > On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan < > ahmad.farhan9...@gmail.com> > > wrote: > > > > > I have worked with two different graph database vendors—Azure Cosmos DB > > > and Neo4j. During our migration to Neo4j, we discovered that using the > > > Gremlin language wasn’t possible; we were forced to rewrite all our > > queries > > > into Cypher, which is the native language for Neo4j and, in my > > experience, > > > much simpler for querying. > > > > > > This situation highlights a key challenge for a common abstraction: the > > > underlying query languages and connection/authentication mechanisms > vary > > > significantly. Gremlin is not only different from Cypher in syntax but > is > > > also deprecated for Neo4j (see > > > https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin). > > > > > > The question would be how can the common approach accommodate these > > > different query languages? > > > > > > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > >> Without deep looking at the code I love the idea - it's very similar > to > > >> what we have for common.sql and common.io - and soon common.messaging > > - I > > >> also - long time ago - suggested common.dataframe that someone could > > >> submit > > >> using Apache Ibis: > > >> https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l - > > >> similarly I believe there was an idea about common.llm ... > > >> > > >> I think the "common" pattern is a great one for Airflow, to build on > top > > >> of > > >> "other giants" who build those common abstractions that you can easily > > >> switch between different implementations of various data access > layers. > > >> > > >> My suggestion and question - would be however (not very strong on it, > I > > >> would love to hear what others think, I know it's been somewhat > > >> contentious > > >> when I started the ibis discussion) - would be to make it > > "common.graph", > > >> "common.dataframe" - instead of "apache.gremlin" or "apache.ibis" - > just > > >> to > > >> stress that those are not implementations of particular service but > > >> opinionated choice of particular technology to do "common" operations. > > >> This > > >> is what essentially "common.io" is . - it should be named "fsspec" > > >> provider > > >> if we were to name it by the "library" that implemented it. > > >> > > >> J. > > >> > > >> > > >> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan < > > ahmad.farhan9...@gmail.com> > > >> wrote: > > >> > > >> > Hi Everyone, > > >> > > > >> > I’ve created a draft PR ( > https://github.com/apache/airflow/pull/46977 > > ) > > >> to > > >> > introduce and discuss a new provider for using Gremlin—the graph > > >> traversal > > >> > language of Apache TinkerPop (more details here: > > >> > https://tinkerpop.apache.org/gremlin.html). Gremlin is supported by > > >> > various > > >> > graph database vendors such as Azure Cosmos DB and Amazon Neptune. > > >> > Previously, I had to develop a custom hook to query data from Azure > > >> Cosmos > > >> > DB using Apache Gremlin. > > >> > > > >> > I managed to create a provider and run it locally on the main > branch. > > >> > However, I ran into the BaseHook issue ( > > >> > https://github.com/apache/airflow/issues/45233) on that branch, so > I > > >> ended > > >> > up testing it fully on the v2-10-test branch. The PR should be > > complete, > > >> > but I’ve kept it as a draft for now while we discuss the provider. > > >> > > > >> > I’m a new contributor, so I’m especially eager to hear your > feedback. > > >> > Comments on the PR is very welcome, and please feel free to reach > out > > >> with > > >> > any questions via email or Slack. > > >> > > > >> > Thanks, > > >> > Ahmad Farhan > > >> > > > >> > > > > > >