Hello dear Cassandra community,

I am a fellow PMC member of Apache Airflow and recently we started to look
at the Cassandra provider of ours in the context of Python 3.12 migration
and the integration raised my interest.

TL;DR; I am quite confused, which client should we use to be future-proof
and I would appreciate the advice of the community on it, also I would like
to understand why there is no community-managed client, as seems that with
the current approach, any Python project (including ASF ones are pretty
much forced to use 3rd-party managed way to use Cassandra, which I find
rather strange.

Context:

So far in Apache Airflow we were using
https://github.com/datastax/python-driver/ to connect to Cassandra, but
when we worked on Python 3.12 compatibility.  While looking at it, I
discovered something strange

This driver is published on Pypi  as "Cassandra driver" [1] which raises a
bit of a question about trademark - I was so far convinced this driver is
managed by the Cassandra community, but at a closer inspection it turned
out that it is - in fact - Datastax driver. I find it pretty confusing to
be honest, and with all the debate about ASF trademarks, this should IMHO
raise a few eyebrows and PMC reaction - if you ask me. As a PMC of Apache
Airflow I am responsible to raise trademark issues if I see them and that
one seems to be at odds with the ASF rules. And if I am confused by
the PyPI naming, then I am pretty sure zany of the users are as well.

Note that I am not attacking anyone with that, I just noticed that this
should likely be handled by the PMC somehow (or that would be my advise at
least as a fellow ASF member and PMC member of a friendly ASF project)

But that's a bit tangential to the problem. Coming back to the main problem.

I did quite some research and it turned out that the driver still uses the
default asyncore stdlib (which is removed in Python 3.12) and even if
theoretically we could use libev reactor, it does not work out of the box
with the .whl released even if proper libraries are installed - you really
have to take an sdist and build the package with gcc configured and
libev4/libev-devel installed.

Another option is to use the asyncio reactor [2] as far as I understand -
but as I understand from the issue [3] - this support is still experimental
and it''s not ready for prime time.

This is all captured in the PR [4] where I work on Python 3.12
compatibility and Cassandra is - literally - the last remaining provider
that we have to make a decision on what to do.

That makes it rather useless fpr us - because we would not only complicate
our testing / tooling setup (we have ~90 providers and pretty complicated
system to manage dependencies already) and also it would make our users who
would want to use Python 3.12 require to the same, which is quite a
blocker. And handling user issues in this case would become rather tiring.

In the same PR Israel Fruchter  - who helped us with the Cassandra issue
and suggested that another option is to use the Scylladb driver - that is
100% compatible and published and released by Scylla [5]. I tested it and
the .whl packages nicely work with libev installed - as expected (and
initially Israel thought the datastax driver will work similarly). From
Israel's explanation Datastax and Scylla are cooperating on the driver (in
fact Scylla one is a fork of the Datastax one) but there is no insight who
and how builds the packages (which also raised my eyebrow because it seems
that - unlike in ASF, the process of building and releasing the package is
not transparent and verifiable).

Now - we have two choices:

1) We can use "cassandra-driver" (which really is a "datastax driver") and
disable Cassandra provider for the users of Airflow for Python 3.12 until
Datastax fixes the compatibility with Python 3.12

2) W can switch to Scylla driver and release next provider with Python 3.12
support

So ... Providing all the context I have two questions:

Q1: What would be the recommended solution by the community here. I
understand the community has no impact on Datastax decisions and effort on
releasing those drivers, so you can at most ask Datastax to fix the
compatibility issue. As a user I have no insight on what relations are
between the Cassandra community, Datastax and Scylla, so I am reaching here
as the place to advise me on which option is best.  (This I am asking as a
confused user)

Q2: I find it pretty worrying that such an important interface (data world
is driven by Python) is not under the community "umbrella" - seems that a
very important thing for the users of Cassandra is managed and
controlled by a 3rd-parties, and the users (as it is in this case) are
pretty much left on the "mercy" (for the lack of better word) of the
3rd-parties - those are the parties that decide on whether Python 3.12
users are able to use Cassandra. If I had such a situation in Airflow, I
would be deeply worried in the PMC. Also what adds to that is the potential
trademark issue that might confuse the users. If I see such a situation,
I'd certainly reach out to tradema...@apache.org to check if that usage of
name is acceptable (and I am pretty sure the answer would be "no" - looking
at some recent discussions). I wonder if there were earlier discussions
about it and whether the PMC is aware of the potential confusion it can
create.

Again - especially for point Q2 -  I also know this might be treated as
some way of complaining, but it's more a concern of a fellow user and ASF
member that is at play here - I just find it quite a bit confusing and
likely bad for the community. Maybe I do not understand the context, and
there are other options I am aware of, but - I simply approached it as a
user and did quite a deep research and arrived to those conclusions, so if
anything, I think it would be good if other users who come the same route
are not as confused as I am.


[1] "Cassandra" driver - https://pypi.org/project/cassandra-driver/

[2] Cassandra Asyncio reactor:
https://docs.datastax.com/en/developer/python-driver/3.25/api/cassandra/io/asyncioreactor/

[3] Consider making asyncio reactor the default:
https://datastax-oss.atlassian.net/browse/PYTHON-1375

[4] Python 3.12 support:
https://github.com/apache/airflow/pull/36755#issuecomment-1954688181

[5] Scylladb driver on PyPI https://pypi.org/project/scylla-driver/

J

Reply via email to