On 2025/02/26 12:38:02 Jarek Potiuk wrote:
> Yeah . `apache/gremlin" seems like a better option then. Does anyone have
> anything against it? 

In the interest of ASF trademarks, I would suggest it be called 
"apache/tinkerpop" with "Gremlin" naming reserved for operators and the like, 
as it is now with GremlinOperator. I think this makes sense because it is 
connecting to TinkerPop-enabled systems via Gremlin. I would similarly suggest 
that references to "Apache Gremlin" and the like become "Apache TinkerPop".

> I think we are pretty happy with accepting "other
> apache" projects as providers, so I see no issue with Gremlin - knowing
> that we can always reach out to our friendly Apache Community in case of
> any issues. So - unless we do not hear any "opposition" in a few days, I
> think it would make sense if you start `[LAZY CONSENSUS]` thread -
> without a need for `[VOTE]` thread.
> 
> One thing though that I would love to have - is to also have an integration
> test if possible (we had it with apache.kafka for example) - those are
> tests that could run **some** graphdb database locally (via docker-compose)
> and run a very rudimentary checks against a "real" database, not a mocked
> call. That would make it more robust.
> 
> More about integration tests, how to build, run, test them and integrate
> them in our CI can be found here:
> https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst
> - happy to help if you are stuck with it.
> 
> J.
> 
> 
> On Wed, Feb 26, 2025 at 1:25 PM Ahmad Farhan <ahmad.farhan9...@gmail.com>
> wrote:
> 
> > I pushed changes to move the provider into the “apache” directory. After
> > updating the class references across the project, I re-tested and all tests
> > passed.
> >
> > Regarding the use of Gremlin (or another graph query language like Cypher
> > and SPARQL) for a common package approach, here are my thoughts on the pros
> > and cons:
> >
> > pros (I can see only one):
> >
> >    - Gremlin has been widely adopted by different cloud vendors (e.g. Azure
> >    Cosmos DB with Apache Gremlin and AWS Neptune) as well as in self-hosted
> >    environments.
> >
> > cons:
> >
> >    - Gremlin, Cypher (native for Neo4j) and SPARQL each have their own
> >    drivers for executing queries.
> >    - To achieve a common abstraction, a wrapper around each driver would be
> >    required. Each driver has its own connection parameters, underlying
> >    protocols, and may need method overrides for compatibility with
> > different
> >    Python versions.
> >    - Not all vendors support every query language; for instance, Gremlin
> >    for Neo4j has been deprecated in recent releases, while Cosmos DB does
> > not
> >    support Cypher or SPARQL.
> >
> > While it would be ideal to have a unified graph query language and driver
> > that works seamlessly across different vendors, such a solution does not
> > exist at the moment. In my opinion, implementing provider-specific
> > solutions for each query language (Gremlin, Cypher, SPARQL) is more
> > realistic and practical given the current landscape.
> >
> > Happy to discuss further or answer any questions!
> >
> > Farhan
> >
> > On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan <ahmad.farhan9...@gmail.com>
> > wrote:
> >
> > > I have worked with two different graph database vendors—Azure Cosmos DB
> > > and Neo4j. During our migration to Neo4j, we discovered that using the
> > > Gremlin language wasn’t possible; we were forced to rewrite all our
> > queries
> > > into Cypher, which is the native language for Neo4j and, in my
> > experience,
> > > much simpler for querying.
> > >
> > > This situation highlights a key challenge for a common abstraction: the
> > > underlying query languages and connection/authentication mechanisms vary
> > > significantly. Gremlin is not only different from Cypher in syntax but is
> > > also deprecated for Neo4j (see
> > > https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin).
> > >
> > > The question would be how can the common approach accommodate these
> > > different query languages?
> > >
> > > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > >> Without deep looking at the code I love the idea - it's very similar to
> > >> what we have for common.sql and common.io - and soon common.messaging
> > - I
> > >> also - long time ago - suggested common.dataframe that someone could
> > >> submit
> > >> using Apache Ibis:
> > >> https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l  -
> > >> similarly I believe there was an idea about common.llm ...
> > >>
> > >> I think the "common" pattern is a great one for Airflow, to build on top
> > >> of
> > >> "other giants" who build those common abstractions that you can easily
> > >> switch between different implementations of various data access layers.
> > >>
> > >> My suggestion and question - would be however (not very strong on it, I
> > >> would love to hear what others think, I know it's been somewhat
> > >> contentious
> > >> when I started the ibis discussion) - would be to make it
> > "common.graph",
> > >> "common.dataframe" - instead of "apache.gremlin" or "apache.ibis" - just
> > >> to
> > >> stress that those are not implementations of particular service but
> > >> opinionated choice of particular technology to do "common" operations.
> > >> This
> > >> is what essentially "common.io" is . - it should be named "fsspec"
> > >> provider
> > >> if we were to name it by the "library" that implemented it.
> > >>
> > >> J.
> > >>
> > >>
> > >> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <
> > ahmad.farhan9...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Everyone,
> > >> >
> > >> > I’ve created a draft PR (https://github.com/apache/airflow/pull/46977
> > )
> > >> to
> > >> > introduce and discuss a new provider for using Gremlin—the graph
> > >> traversal
> > >> > language of Apache TinkerPop (more details here:
> > >> > https://tinkerpop.apache.org/gremlin.html). Gremlin is supported by
> > >> > various
> > >> > graph database vendors such as Azure Cosmos DB and Amazon Neptune.
> > >> > Previously, I had to develop a custom hook to query data from Azure
> > >> Cosmos
> > >> > DB using Apache Gremlin.
> > >> >
> > >> > I managed to create a provider and run it locally on the main branch.
> > >> > However, I ran into the BaseHook issue (
> > >> > https://github.com/apache/airflow/issues/45233) on that branch, so I
> > >> ended
> > >> > up testing it fully on the v2-10-test branch. The PR should be
> > complete,
> > >> > but I’ve kept it as a draft for now while we discuss the provider.
> > >> >
> > >> > I’m a new contributor, so I’m especially eager to hear your feedback.
> > >> > Comments on the PR is very welcome, and please feel free to reach out
> > >> with
> > >> > any questions via email or Slack.
> > >> >
> > >> > Thanks,
> > >> > Ahmad Farhan
> > >> >
> > >>
> > >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to