Hello again!
After several days of debugging and branch restructuring I managed to
create another PR https://github.com/apache/airflow/pull/47446, this is due
to major merge conflicts after the www clean up.
The renaming of the provider to apache-tinkerpop is done but I kept the
integration testing as 'gremlin' to avoid the conflict/confusion with the
server name within the CI.

Please review the PR thoroughly and I am happy to work on the following
steps.

Thanks,
Farhan

On Wed, Feb 26, 2025 at 7:43 PM Ahmad Farhan <ahmad.farhan9...@gmail.com>
wrote:

> I did look into the naming and I thought that it would need to be
> discussed at some point before the lazy consensus stage after the dev work
> is done but I guess I was wrong :)
>
> I read through some docs regarding the naming and I kept thinking that
> Apache Gremlin might not be right, so I decided to remove 'Apache' from all
> doc strings. One thing that popped in one of the documentations from
> Microsoft that says Apache Gremlin (here
> https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/introduction)
> which confused me.
>
> I will change the folder to apache/tinkerpop. Thanks Stephen for the
> in-depth explanation.
>
> On Wed, Feb 26, 2025 at 6:56 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Cool. I will let Ahmad comment, but I think we found the **someone** who
>> will help in case there are some future issues with the Tinkerpop/Gremlin
>> provider.
>> While I like Gremlin better (it's just a cool name and I like the logo,
>> tinkerpop has a cool logo as well https://tinkerpop.apache.org/index.html
>> ).
>>
>> So as long as we decide not to use common.graph -> I am fine with both :)
>>
>> j.
>>
>>
>> On Wed, Feb 26, 2025 at 7:42 PM Stephen Mallette <spmalle...@gmail.com>
>> wrote:
>>
>> > On Wed, Feb 26, 2025 at 12:57 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>> >
>> > > > In the interest of ASF trademarks, I would suggest it be called
>> > > "apache/tinkerpop" with "Gremlin" naming reserved for operators and
>> the
>> > > like, as it is now with GremlinOperator. I think this makes sense
>> because
>> > > it is connecting to TinkerPop-enabled systems via Gremlin. I would
>> > > similarly suggest that references to "Apache Gremlin" and the like
>> become
>> > > "Apache TinkerPop".
>> > >
>> > > That's an interesting one - indeed TinkerPop is the PMC/ Framework -
>> > > Gremlin is the language.
>> > >
>> > > I am not sure we are actually using TinkerPop here - because
>> TinkerPop is
>> > > the whole framework - Ahmad, can you explain the relation there - are
>> > those
>> > > other systems simply implement Gremlin as language or do they use
>> > TinkerPop
>> > > for something / as a backend?
>> > >
>> >
>> > I'm sure Ahmad could answer but I'll quickly offer my take. I think
>> that in
>> > this case we should prefer "TinkerPop" over Gremlin as a top-level name
>> > particularly because it's prefixed with "Apache" and there is no "Apache
>> > Gremlin" which I tend to think is confusing when the words are that
>> close
>> > together. I can't recall over the years just how many times I've asked
>> for
>> > corrections in blog posts. :)
>> >
>> > Because that's a bit of a conceptual difference here. For example in the
>> > > provider we are importing https://pypi.org/project/gremlinpython not
>> > > "tinkerpop" - and it also does not have tinkerpop as dependency.
>> > >
>> >
>> > A bit of history goes along with a lot of our naming for what we term
>> > Gremlin Language Variants (GLVs), like gremlinpython, which are
>> variants of
>> > Gremlin natively implemented to allow users to express Gremlin in the
>> > idioms of their own language. They also provide driver connectivity to
>> > compatible servers. TinkerPop has mostly inherited all of its language
>> > variants, including gremlinpython which was the first, from third-party
>> > community developers. As a project, we didn't really get a hand in the
>> > naming so with those projects already in heavy use we just kinda of
>> stuck
>> > to it and even doubled-down (like when we built gremlin-go within the
>> ASF).
>> >
>> > I think in this case, your project organization under "apache" seems to
>> > almost lend itself nicely to apache/tinkerpop. i think users will
>> recognize
>> > it as equally as they recognize Gremlin.
>> >
>> >
>> > >
>> > > I wonder if Gremlin is also a Trademark by Apache ? Maybe we should
>> ask
>> > > tinkerpop PMC what they think about it?
>> > >
>> >
>> > Gremlin is not an ASF trademark. That was debated for quite a long time
>> > with trademarks@ along with deciding if Gremlin, the character and his
>> > friends[1], were to be protected. In the end, for reasons I'm not sure I
>> > quite remember, the ASF didn't think it was necessary.
>> >
>> > Anyway, I'm not sure if you noted my earlier post[2] but I'm one of the
>> > original contributors to Apache TinkerPop, even before we brought it to
>> the
>> > ASF so I'm pretty familiar with our project.  :)
>> >
>> > [1]
>> >
>> >
>> https://github.com/apache/tinkerpop/blob/master/docs/static/images/tinkerpop3-splash.png
>> > [2] https://lists.apache.org/thread/9hf4t8hyk944fyo4q3nygczyo5xhk18y
>> >
>> >
>> > >
>> > >
>> > > J.
>> > >
>> > >
>> > > On Wed, Feb 26, 2025 at 4:55 PM Stephen Mallette <
>> spmalle...@apache.org>
>> > > wrote:
>> > >
>> > > >
>> > > >
>> > > > On 2025/02/26 12:38:02 Jarek Potiuk wrote:
>> > > > > Yeah . `apache/gremlin" seems like a better option then. Does
>> anyone
>> > > have
>> > > > > anything against it?
>> > > >
>> > > > In the interest of ASF trademarks, I would suggest it be called
>> > > > "apache/tinkerpop" with "Gremlin" naming reserved for operators and
>> the
>> > > > like, as it is now with GremlinOperator. I think this makes sense
>> > because
>> > > > it is connecting to TinkerPop-enabled systems via Gremlin. I would
>> > > > similarly suggest that references to "Apache Gremlin" and the like
>> > become
>> > > > "Apache TinkerPop".
>> > > >
>> > > > > I think we are pretty happy with accepting "other
>> > > > > apache" projects as providers, so I see no issue with Gremlin -
>> > knowing
>> > > > > that we can always reach out to our friendly Apache Community in
>> case
>> > > of
>> > > > > any issues. So - unless we do not hear any "opposition" in a few
>> > days,
>> > > I
>> > > > > think it would make sense if you start `[LAZY CONSENSUS]` thread -
>> > > > > without a need for `[VOTE]` thread.
>> > > > >
>> > > > > One thing though that I would love to have - is to also have an
>> > > > integration
>> > > > > test if possible (we had it with apache.kafka for example) - those
>> > are
>> > > > > tests that could run **some** graphdb database locally (via
>> > > > docker-compose)
>> > > > > and run a very rudimentary checks against a "real" database, not a
>> > > mocked
>> > > > > call. That would make it more robust.
>> > > > >
>> > > > > More about integration tests, how to build, run, test them and
>> > > integrate
>> > > > > them in our CI can be found here:
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst
>> > > > > - happy to help if you are stuck with it.
>> > > > >
>> > > > > J.
>> > > > >
>> > > > >
>> > > > > On Wed, Feb 26, 2025 at 1:25 PM Ahmad Farhan <
>> > > ahmad.farhan9...@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > I pushed changes to move the provider into the “apache”
>> directory.
>> > > > After
>> > > > > > updating the class references across the project, I re-tested
>> and
>> > all
>> > > > tests
>> > > > > > passed.
>> > > > > >
>> > > > > > Regarding the use of Gremlin (or another graph query language
>> like
>> > > > Cypher
>> > > > > > and SPARQL) for a common package approach, here are my thoughts
>> on
>> > > the
>> > > > pros
>> > > > > > and cons:
>> > > > > >
>> > > > > > pros (I can see only one):
>> > > > > >
>> > > > > >    - Gremlin has been widely adopted by different cloud vendors
>> > (e.g.
>> > > > Azure
>> > > > > >    Cosmos DB with Apache Gremlin and AWS Neptune) as well as in
>> > > > self-hosted
>> > > > > >    environments.
>> > > > > >
>> > > > > > cons:
>> > > > > >
>> > > > > >    - Gremlin, Cypher (native for Neo4j) and SPARQL each have
>> their
>> > > own
>> > > > > >    drivers for executing queries.
>> > > > > >    - To achieve a common abstraction, a wrapper around each
>> driver
>> > > > would be
>> > > > > >    required. Each driver has its own connection parameters,
>> > > underlying
>> > > > > >    protocols, and may need method overrides for compatibility
>> with
>> > > > > > different
>> > > > > >    Python versions.
>> > > > > >    - Not all vendors support every query language; for instance,
>> > > > Gremlin
>> > > > > >    for Neo4j has been deprecated in recent releases, while
>> Cosmos
>> > DB
>> > > > does
>> > > > > > not
>> > > > > >    support Cypher or SPARQL.
>> > > > > >
>> > > > > > While it would be ideal to have a unified graph query language
>> and
>> > > > driver
>> > > > > > that works seamlessly across different vendors, such a solution
>> > does
>> > > > not
>> > > > > > exist at the moment. In my opinion, implementing
>> provider-specific
>> > > > > > solutions for each query language (Gremlin, Cypher, SPARQL) is
>> more
>> > > > > > realistic and practical given the current landscape.
>> > > > > >
>> > > > > > Happy to discuss further or answer any questions!
>> > > > > >
>> > > > > > Farhan
>> > > > > >
>> > > > > > On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan <
>> > > > ahmad.farhan9...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > I have worked with two different graph database vendors—Azure
>> > > Cosmos
>> > > > DB
>> > > > > > > and Neo4j. During our migration to Neo4j, we discovered that
>> > using
>> > > > the
>> > > > > > > Gremlin language wasn’t possible; we were forced to rewrite
>> all
>> > our
>> > > > > > queries
>> > > > > > > into Cypher, which is the native language for Neo4j and, in my
>> > > > > > experience,
>> > > > > > > much simpler for querying.
>> > > > > > >
>> > > > > > > This situation highlights a key challenge for a common
>> > abstraction:
>> > > > the
>> > > > > > > underlying query languages and connection/authentication
>> > mechanisms
>> > > > vary
>> > > > > > > significantly. Gremlin is not only different from Cypher in
>> > syntax
>> > > > but is
>> > > > > > > also deprecated for Neo4j (see
>> > > > > > >
>> https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin
>> > ).
>> > > > > > >
>> > > > > > > The question would be how can the common approach accommodate
>> > these
>> > > > > > > different query languages?
>> > > > > > >
>> > > > > > > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <
>> ja...@potiuk.com>
>> > > > wrote:
>> > > > > > >
>> > > > > > >> Without deep looking at the code I love the idea - it's very
>> > > > similar to
>> > > > > > >> what we have for common.sql and common.io - and soon
>> > > > common.messaging
>> > > > > > - I
>> > > > > > >> also - long time ago - suggested common.dataframe that
>> someone
>> > > could
>> > > > > > >> submit
>> > > > > > >> using Apache Ibis:
>> > > > > > >>
>> > https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l
>> > > -
>> > > > > > >> similarly I believe there was an idea about common.llm ...
>> > > > > > >>
>> > > > > > >> I think the "common" pattern is a great one for Airflow, to
>> > build
>> > > > on top
>> > > > > > >> of
>> > > > > > >> "other giants" who build those common abstractions that you
>> can
>> > > > easily
>> > > > > > >> switch between different implementations of various data
>> access
>> > > > layers.
>> > > > > > >>
>> > > > > > >> My suggestion and question - would be however (not very
>> strong
>> > on
>> > > > it, I
>> > > > > > >> would love to hear what others think, I know it's been
>> somewhat
>> > > > > > >> contentious
>> > > > > > >> when I started the ibis discussion) - would be to make it
>> > > > > > "common.graph",
>> > > > > > >> "common.dataframe" - instead of "apache.gremlin" or
>> > "apache.ibis"
>> > > -
>> > > > just
>> > > > > > >> to
>> > > > > > >> stress that those are not implementations of particular
>> service
>> > > but
>> > > > > > >> opinionated choice of particular technology to do "common"
>> > > > operations.
>> > > > > > >> This
>> > > > > > >> is what essentially "common.io" is . - it should be named
>> > > "fsspec"
>> > > > > > >> provider
>> > > > > > >> if we were to name it by the "library" that implemented it.
>> > > > > > >>
>> > > > > > >> J.
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <
>> > > > > > ahmad.farhan9...@gmail.com>
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >> > Hi Everyone,
>> > > > > > >> >
>> > > > > > >> > I’ve created a draft PR (
>> > > > https://github.com/apache/airflow/pull/46977
>> > > > > > )
>> > > > > > >> to
>> > > > > > >> > introduce and discuss a new provider for using Gremlin—the
>> > graph
>> > > > > > >> traversal
>> > > > > > >> > language of Apache TinkerPop (more details here:
>> > > > > > >> > https://tinkerpop.apache.org/gremlin.html). Gremlin is
>> > > supported
>> > > > by
>> > > > > > >> > various
>> > > > > > >> > graph database vendors such as Azure Cosmos DB and Amazon
>> > > Neptune.
>> > > > > > >> > Previously, I had to develop a custom hook to query data
>> from
>> > > > Azure
>> > > > > > >> Cosmos
>> > > > > > >> > DB using Apache Gremlin.
>> > > > > > >> >
>> > > > > > >> > I managed to create a provider and run it locally on the
>> main
>> > > > branch.
>> > > > > > >> > However, I ran into the BaseHook issue (
>> > > > > > >> > https://github.com/apache/airflow/issues/45233) on that
>> > branch,
>> > > > so I
>> > > > > > >> ended
>> > > > > > >> > up testing it fully on the v2-10-test branch. The PR
>> should be
>> > > > > > complete,
>> > > > > > >> > but I’ve kept it as a draft for now while we discuss the
>> > > provider.
>> > > > > > >> >
>> > > > > > >> > I’m a new contributor, so I’m especially eager to hear your
>> > > > feedback.
>> > > > > > >> > Comments on the PR is very welcome, and please feel free to
>> > > reach
>> > > > out
>> > > > > > >> with
>> > > > > > >> > any questions via email or Slack.
>> > > > > > >> >
>> > > > > > >> > Thanks,
>> > > > > > >> > Ahmad Farhan
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > > > For additional commands, e-mail: dev-h...@airflow.apache.org
>> > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to