Hi everyone, I just wanted to check in on the status of the TinkerPop
Provider PR:

https://github.com/apache/airflow/pull/47446

I don't know if there is any outstanding work left here but I was curious
if there was any sense from committers as to when this body of work will be
merged and made available in a release?

Thanks,

Stephen


On Sat, Mar 8, 2025 at 2:57 PM Paul King <pa...@apache.org> wrote:

>
> Nice to see the Apache TinkerPop/Gremlin support. As well as using Gremlin
> with TinkerPop's TinkerGraph, I have also used it with OrientDB, ArcadeDB,
> and Apache HugeGraph. If anyone is interested, I did a blog post here:
> https://groovy.apache.org/blog/groovy-graph-databases
> (Sorry but it's Groovy not Python)
>
> Cheers, Paul.
>
> On 2025/03/08 11:34:11 Ahmad Farhan wrote:
> > Hi!
> > I managed to do the last cleanup on the PR, and is ready for thorough
> > review. I will create a  `[LAZY CONSENSUS]` thread on Monday/Tuesday.
> Feel
> > free to review and comment on the PR in the meantime.
> >
> > Farhan
> >
> > On Thu, Mar 6, 2025 at 1:40 PM Stephen Mallette <spmalle...@gmail.com>
> > wrote:
> >
> > > Hi Ahmad, thanks for the updates! I've spent some time looking at the
> > > changes and have added some comments and questions. I've pointed out a
> > > couple things for future work that I think will be important, but
> nothing
> > > that needs to be changed for this PR in my mind. Looking forward to
> seeing
> > > your responses, other community feedback, and ultimately a merge of the
> > > provider - take care!
> > >
> > > On Thu, Mar 6, 2025 at 7:28 AM Ahmad Farhan <
> ahmad.farhan9...@gmail.com>
> > > wrote:
> > >
> > > > Hello again!
> > > > After several days of debugging and branch restructuring I managed to
> > > > create another PR https://github.com/apache/airflow/pull/47446,
> this is
> > > > due
> > > > to major merge conflicts after the www clean up.
> > > > The renaming of the provider to apache-tinkerpop is done but I kept
> the
> > > > integration testing as 'gremlin' to avoid the conflict/confusion
> with the
> > > > server name within the CI.
> > > >
> > > > Please review the PR thoroughly and I am happy to work on the
> following
> > > > steps.
> > > >
> > > > Thanks,
> > > > Farhan
> > > >
> > > > On Wed, Feb 26, 2025 at 7:43 PM Ahmad Farhan <
> ahmad.farhan9...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I did look into the naming and I thought that it would need to be
> > > > > discussed at some point before the lazy consensus stage after the
> dev
> > > > work
> > > > > is done but I guess I was wrong :)
> > > > >
> > > > > I read through some docs regarding the naming and I kept thinking
> that
> > > > > Apache Gremlin might not be right, so I decided to remove 'Apache'
> from
> > > > all
> > > > > doc strings. One thing that popped in one of the documentations
> from
> > > > > Microsoft that says Apache Gremlin (here
> > > > >
> https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/introduction
> > > )
> > > > > which confused me.
> > > > >
> > > > > I will change the folder to apache/tinkerpop. Thanks Stephen for
> the
> > > > > in-depth explanation.
> > > > >
> > > > > On Wed, Feb 26, 2025 at 6:56 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > > > >
> > > > >> Cool. I will let Ahmad comment, but I think we found the
> **someone**
> > > who
> > > > >> will help in case there are some future issues with the
> > > > Tinkerpop/Gremlin
> > > > >> provider.
> > > > >> While I like Gremlin better (it's just a cool name and I like the
> > > logo,
> > > > >> tinkerpop has a cool logo as well
> > > > https://tinkerpop.apache.org/index.html
> > > > >> ).
> > > > >>
> > > > >> So as long as we decide not to use common.graph -> I am fine with
> both
> > > > :)
> > > > >>
> > > > >> j.
> > > > >>
> > > > >>
> > > > >> On Wed, Feb 26, 2025 at 7:42 PM Stephen Mallette <
> > > spmalle...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > On Wed, Feb 26, 2025 at 12:57 PM Jarek Potiuk <ja...@potiuk.com
> >
> > > > wrote:
> > > > >> >
> > > > >> > > > In the interest of ASF trademarks, I would suggest it be
> called
> > > > >> > > "apache/tinkerpop" with "Gremlin" naming reserved for
> operators
> > > and
> > > > >> the
> > > > >> > > like, as it is now with GremlinOperator. I think this makes
> sense
> > > > >> because
> > > > >> > > it is connecting to TinkerPop-enabled systems via Gremlin. I
> would
> > > > >> > > similarly suggest that references to "Apache Gremlin" and the
> like
> > > > >> become
> > > > >> > > "Apache TinkerPop".
> > > > >> > >
> > > > >> > > That's an interesting one - indeed TinkerPop is the PMC/
> > > Framework -
> > > > >> > > Gremlin is the language.
> > > > >> > >
> > > > >> > > I am not sure we are actually using TinkerPop here - because
> > > > >> TinkerPop is
> > > > >> > > the whole framework - Ahmad, can you explain the relation
> there -
> > > > are
> > > > >> > those
> > > > >> > > other systems simply implement Gremlin as language or do they
> use
> > > > >> > TinkerPop
> > > > >> > > for something / as a backend?
> > > > >> > >
> > > > >> >
> > > > >> > I'm sure Ahmad could answer but I'll quickly offer my take. I
> think
> > > > >> that in
> > > > >> > this case we should prefer "TinkerPop" over Gremlin as a
> top-level
> > > > name
> > > > >> > particularly because it's prefixed with "Apache" and there is no
> > > > "Apache
> > > > >> > Gremlin" which I tend to think is confusing when the words are
> that
> > > > >> close
> > > > >> > together. I can't recall over the years just how many times I've
> > > asked
> > > > >> for
> > > > >> > corrections in blog posts. :)
> > > > >> >
> > > > >> > Because that's a bit of a conceptual difference here. For
> example in
> > > > the
> > > > >> > > provider we are importing
> https://pypi.org/project/gremlinpython
> > > > not
> > > > >> > > "tinkerpop" - and it also does not have tinkerpop as
> dependency.
> > > > >> > >
> > > > >> >
> > > > >> > A bit of history goes along with a lot of our naming for what we
> > > term
> > > > >> > Gremlin Language Variants (GLVs), like gremlinpython, which are
> > > > >> variants of
> > > > >> > Gremlin natively implemented to allow users to express Gremlin
> in
> > > the
> > > > >> > idioms of their own language. They also provide driver
> connectivity
> > > to
> > > > >> > compatible servers. TinkerPop has mostly inherited all of its
> > > language
> > > > >> > variants, including gremlinpython which was the first, from
> > > > third-party
> > > > >> > community developers. As a project, we didn't really get a hand
> in
> > > the
> > > > >> > naming so with those projects already in heavy use we just
> kinda of
> > > > >> stuck
> > > > >> > to it and even doubled-down (like when we built gremlin-go
> within
> > > the
> > > > >> ASF).
> > > > >> >
> > > > >> > I think in this case, your project organization under "apache"
> seems
> > > > to
> > > > >> > almost lend itself nicely to apache/tinkerpop. i think users
> will
> > > > >> recognize
> > > > >> > it as equally as they recognize Gremlin.
> > > > >> >
> > > > >> >
> > > > >> > >
> > > > >> > > I wonder if Gremlin is also a Trademark by Apache ? Maybe we
> > > should
> > > > >> ask
> > > > >> > > tinkerpop PMC what they think about it?
> > > > >> > >
> > > > >> >
> > > > >> > Gremlin is not an ASF trademark. That was debated for quite a
> long
> > > > time
> > > > >> > with trademarks@ along with deciding if Gremlin, the character
> and
> > > > his
> > > > >> > friends[1], were to be protected. In the end, for reasons I'm
> not
> > > > sure I
> > > > >> > quite remember, the ASF didn't think it was necessary.
> > > > >> >
> > > > >> > Anyway, I'm not sure if you noted my earlier post[2] but I'm
> one of
> > > > the
> > > > >> > original contributors to Apache TinkerPop, even before we
> brought it
> > > > to
> > > > >> the
> > > > >> > ASF so I'm pretty familiar with our project.  :)
> > > > >> >
> > > > >> > [1]
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> https://github.com/apache/tinkerpop/blob/master/docs/static/images/tinkerpop3-splash.png
> > > > >> > [2]
> > > https://lists.apache.org/thread/9hf4t8hyk944fyo4q3nygczyo5xhk18y
> > > > >> >
> > > > >> >
> > > > >> > >
> > > > >> > >
> > > > >> > > J.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Feb 26, 2025 at 4:55 PM Stephen Mallette <
> > > > >> spmalle...@apache.org>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On 2025/02/26 12:38:02 Jarek Potiuk wrote:
> > > > >> > > > > Yeah . `apache/gremlin" seems like a better option then.
> Does
> > > > >> anyone
> > > > >> > > have
> > > > >> > > > > anything against it?
> > > > >> > > >
> > > > >> > > > In the interest of ASF trademarks, I would suggest it be
> called
> > > > >> > > > "apache/tinkerpop" with "Gremlin" naming reserved for
> operators
> > > > and
> > > > >> the
> > > > >> > > > like, as it is now with GremlinOperator. I think this makes
> > > sense
> > > > >> > because
> > > > >> > > > it is connecting to TinkerPop-enabled systems via Gremlin. I
> > > would
> > > > >> > > > similarly suggest that references to "Apache Gremlin" and
> the
> > > like
> > > > >> > become
> > > > >> > > > "Apache TinkerPop".
> > > > >> > > >
> > > > >> > > > > I think we are pretty happy with accepting "other
> > > > >> > > > > apache" projects as providers, so I see no issue with
> Gremlin
> > > -
> > > > >> > knowing
> > > > >> > > > > that we can always reach out to our friendly Apache
> Community
> > > in
> > > > >> case
> > > > >> > > of
> > > > >> > > > > any issues. So - unless we do not hear any "opposition"
> in a
> > > few
> > > > >> > days,
> > > > >> > > I
> > > > >> > > > > think it would make sense if you start `[LAZY CONSENSUS]`
> > > > thread -
> > > > >> > > > > without a need for `[VOTE]` thread.
> > > > >> > > > >
> > > > >> > > > > One thing though that I would love to have - is to also
> have
> > > an
> > > > >> > > > integration
> > > > >> > > > > test if possible (we had it with apache.kafka for
> example) -
> > > > those
> > > > >> > are
> > > > >> > > > > tests that could run **some** graphdb database locally
> (via
> > > > >> > > > docker-compose)
> > > > >> > > > > and run a very rudimentary checks against a "real"
> database,
> > > > not a
> > > > >> > > mocked
> > > > >> > > > > call. That would make it more robust.
> > > > >> > > > >
> > > > >> > > > > More about integration tests, how to build, run, test
> them and
> > > > >> > > integrate
> > > > >> > > > > them in our CI can be found here:
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst
> > > > >> > > > > - happy to help if you are stuck with it.
> > > > >> > > > >
> > > > >> > > > > J.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Wed, Feb 26, 2025 at 1:25 PM Ahmad Farhan <
> > > > >> > > ahmad.farhan9...@gmail.com
> > > > >> > > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > I pushed changes to move the provider into the “apache”
> > > > >> directory.
> > > > >> > > > After
> > > > >> > > > > > updating the class references across the project, I
> > > re-tested
> > > > >> and
> > > > >> > all
> > > > >> > > > tests
> > > > >> > > > > > passed.
> > > > >> > > > > >
> > > > >> > > > > > Regarding the use of Gremlin (or another graph query
> > > language
> > > > >> like
> > > > >> > > > Cypher
> > > > >> > > > > > and SPARQL) for a common package approach, here are my
> > > > thoughts
> > > > >> on
> > > > >> > > the
> > > > >> > > > pros
> > > > >> > > > > > and cons:
> > > > >> > > > > >
> > > > >> > > > > > pros (I can see only one):
> > > > >> > > > > >
> > > > >> > > > > >    - Gremlin has been widely adopted by different cloud
> > > > vendors
> > > > >> > (e.g.
> > > > >> > > > Azure
> > > > >> > > > > >    Cosmos DB with Apache Gremlin and AWS Neptune) as
> well as
> > > > in
> > > > >> > > > self-hosted
> > > > >> > > > > >    environments.
> > > > >> > > > > >
> > > > >> > > > > > cons:
> > > > >> > > > > >
> > > > >> > > > > >    - Gremlin, Cypher (native for Neo4j) and SPARQL each
> have
> > > > >> their
> > > > >> > > own
> > > > >> > > > > >    drivers for executing queries.
> > > > >> > > > > >    - To achieve a common abstraction, a wrapper around
> each
> > > > >> driver
> > > > >> > > > would be
> > > > >> > > > > >    required. Each driver has its own connection
> parameters,
> > > > >> > > underlying
> > > > >> > > > > >    protocols, and may need method overrides for
> > > compatibility
> > > > >> with
> > > > >> > > > > > different
> > > > >> > > > > >    Python versions.
> > > > >> > > > > >    - Not all vendors support every query language; for
> > > > instance,
> > > > >> > > > Gremlin
> > > > >> > > > > >    for Neo4j has been deprecated in recent releases,
> while
> > > > >> Cosmos
> > > > >> > DB
> > > > >> > > > does
> > > > >> > > > > > not
> > > > >> > > > > >    support Cypher or SPARQL.
> > > > >> > > > > >
> > > > >> > > > > > While it would be ideal to have a unified graph query
> > > language
> > > > >> and
> > > > >> > > > driver
> > > > >> > > > > > that works seamlessly across different vendors, such a
> > > > solution
> > > > >> > does
> > > > >> > > > not
> > > > >> > > > > > exist at the moment. In my opinion, implementing
> > > > >> provider-specific
> > > > >> > > > > > solutions for each query language (Gremlin, Cypher,
> SPARQL)
> > > is
> > > > >> more
> > > > >> > > > > > realistic and practical given the current landscape.
> > > > >> > > > > >
> > > > >> > > > > > Happy to discuss further or answer any questions!
> > > > >> > > > > >
> > > > >> > > > > > Farhan
> > > > >> > > > > >
> > > > >> > > > > > On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan <
> > > > >> > > > ahmad.farhan9...@gmail.com>
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > I have worked with two different graph database
> > > > vendors—Azure
> > > > >> > > Cosmos
> > > > >> > > > DB
> > > > >> > > > > > > and Neo4j. During our migration to Neo4j, we
> discovered
> > > that
> > > > >> > using
> > > > >> > > > the
> > > > >> > > > > > > Gremlin language wasn’t possible; we were forced to
> > > rewrite
> > > > >> all
> > > > >> > our
> > > > >> > > > > > queries
> > > > >> > > > > > > into Cypher, which is the native language for Neo4j
> and,
> > > in
> > > > my
> > > > >> > > > > > experience,
> > > > >> > > > > > > much simpler for querying.
> > > > >> > > > > > >
> > > > >> > > > > > > This situation highlights a key challenge for a common
> > > > >> > abstraction:
> > > > >> > > > the
> > > > >> > > > > > > underlying query languages and
> connection/authentication
> > > > >> > mechanisms
> > > > >> > > > vary
> > > > >> > > > > > > significantly. Gremlin is not only different from
> Cypher
> > > in
> > > > >> > syntax
> > > > >> > > > but is
> > > > >> > > > > > > also deprecated for Neo4j (see
> > > > >> > > > > > >
> > > > >> https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin
> > > > >> > ).
> > > > >> > > > > > >
> > > > >> > > > > > > The question would be how can the common approach
> > > > accommodate
> > > > >> > these
> > > > >> > > > > > > different query languages?
> > > > >> > > > > > >
> > > > >> > > > > > > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <
> > > > >> ja...@potiuk.com>
> > > > >> > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > >> Without deep looking at the code I love the idea -
> it's
> > > > very
> > > > >> > > > similar to
> > > > >> > > > > > >> what we have for common.sql and common.io - and soon
> > > > >> > > > common.messaging
> > > > >> > > > > > - I
> > > > >> > > > > > >> also - long time ago - suggested common.dataframe
> that
> > > > >> someone
> > > > >> > > could
> > > > >> > > > > > >> submit
> > > > >> > > > > > >> using Apache Ibis:
> > > > >> > > > > > >>
> > > > >> >
> https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l
> > > > >> > > -
> > > > >> > > > > > >> similarly I believe there was an idea about
> common.llm
> > > ...
> > > > >> > > > > > >>
> > > > >> > > > > > >> I think the "common" pattern is a great one for
> Airflow,
> > > to
> > > > >> > build
> > > > >> > > > on top
> > > > >> > > > > > >> of
> > > > >> > > > > > >> "other giants" who build those common abstractions
> that
> > > you
> > > > >> can
> > > > >> > > > easily
> > > > >> > > > > > >> switch between different implementations of various
> data
> > > > >> access
> > > > >> > > > layers.
> > > > >> > > > > > >>
> > > > >> > > > > > >> My suggestion and question - would be however (not
> very
> > > > >> strong
> > > > >> > on
> > > > >> > > > it, I
> > > > >> > > > > > >> would love to hear what others think, I know it's
> been
> > > > >> somewhat
> > > > >> > > > > > >> contentious
> > > > >> > > > > > >> when I started the ibis discussion) - would be to
> make it
> > > > >> > > > > > "common.graph",
> > > > >> > > > > > >> "common.dataframe" - instead of "apache.gremlin" or
> > > > >> > "apache.ibis"
> > > > >> > > -
> > > > >> > > > just
> > > > >> > > > > > >> to
> > > > >> > > > > > >> stress that those are not implementations of
> particular
> > > > >> service
> > > > >> > > but
> > > > >> > > > > > >> opinionated choice of particular technology to do
> > > "common"
> > > > >> > > > operations.
> > > > >> > > > > > >> This
> > > > >> > > > > > >> is what essentially "common.io" is . - it should be
> > > named
> > > > >> > > "fsspec"
> > > > >> > > > > > >> provider
> > > > >> > > > > > >> if we were to name it by the "library" that
> implemented
> > > it.
> > > > >> > > > > > >>
> > > > >> > > > > > >> J.
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <
> > > > >> > > > > > ahmad.farhan9...@gmail.com>
> > > > >> > > > > > >> wrote:
> > > > >> > > > > > >>
> > > > >> > > > > > >> > Hi Everyone,
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > I’ve created a draft PR (
> > > > >> > > > https://github.com/apache/airflow/pull/46977
> > > > >> > > > > > )
> > > > >> > > > > > >> to
> > > > >> > > > > > >> > introduce and discuss a new provider for using
> > > > Gremlin—the
> > > > >> > graph
> > > > >> > > > > > >> traversal
> > > > >> > > > > > >> > language of Apache TinkerPop (more details here:
> > > > >> > > > > > >> > https://tinkerpop.apache.org/gremlin.html).
> Gremlin is
> > > > >> > > supported
> > > > >> > > > by
> > > > >> > > > > > >> > various
> > > > >> > > > > > >> > graph database vendors such as Azure Cosmos DB and
> > > Amazon
> > > > >> > > Neptune.
> > > > >> > > > > > >> > Previously, I had to develop a custom hook to query
> > > data
> > > > >> from
> > > > >> > > > Azure
> > > > >> > > > > > >> Cosmos
> > > > >> > > > > > >> > DB using Apache Gremlin.
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > I managed to create a provider and run it locally
> on
> > > the
> > > > >> main
> > > > >> > > > branch.
> > > > >> > > > > > >> > However, I ran into the BaseHook issue (
> > > > >> > > > > > >> > https://github.com/apache/airflow/issues/45233) on
> > > that
> > > > >> > branch,
> > > > >> > > > so I
> > > > >> > > > > > >> ended
> > > > >> > > > > > >> > up testing it fully on the v2-10-test branch. The
> PR
> > > > >> should be
> > > > >> > > > > > complete,
> > > > >> > > > > > >> > but I’ve kept it as a draft for now while we
> discuss
> > > the
> > > > >> > > provider.
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > I’m a new contributor, so I’m especially eager to
> hear
> > > > your
> > > > >> > > > feedback.
> > > > >> > > > > > >> > Comments on the PR is very welcome, and please feel
> > > free
> > > > to
> > > > >> > > reach
> > > > >> > > > out
> > > > >> > > > > > >> with
> > > > >> > > > > > >> > any questions via email or Slack.
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Thanks,
> > > > >> > > > > > >> > Ahmad Farhan
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >>
> ---------------------------------------------------------------------
> > > > >> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > >> > > > For additional commands, e-mail:
> dev-h...@airflow.apache.org
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to