Hello everyone,

TL;DR; I have a small proposal - I would like to fix constraints for
all Airflow 2.5.*  and 2.6*  by updating pymssql to (released today)
2.2.8 version from 2.2.7.

I still have to wait for the "complete" release. They have not yet
released linux binary variants of the packages for 2.2.8 and people
watching it flagged it to the maintainers. but I wanted to get
consensus on it before I start doing it.

Currently users installing MSSQL provider for their ARM-based airflow
are experiencing "build failed" when pymssql is installed. They have
to use a workaround described here -
https://github.com/apache/airflow/issues/32672#issuecomment-1647007726
and the proposal aims to fix it so that the workaround will not be
needed when using constraints. There are already few issues about it
in our repo:

This is one of the extremely rare cases (happened already 2 times over
last 2 years) where our "reproducible installation" stopped working
for Python versions - because of the `pip` tooling update that we have
no control over, but thanks to ability of updating constraints, we can
fix it by updating constraints.

If we get consensus I will use that opportunity to add some tooling to
make it easier to do such updates in the future - it requires creating
new branch for every versio and moving constraint tags - but this is
easy to automate. And I will have an excuse to develop a small tool do
help with that that - which we will be able to use in the future in
simillar cases (I've done it manually before).

Some more context:

Two weeks ago, on 17th of July, Cython released a long-in-the-making
3.0.0 version with some backwards-incompatible changes, and while a
lot of the packages have been made compatible, pymssql was one of the
packages that was not.  The issue did not affect x86 users, because
pymssql binaries were pre-compiled in PyPI
https://pypi.org/project/pymssql/2.2.7 but ARM users have problems
installing it, because it needs to be compiled on-the-flight for them.

It caused quite a bit of mayhem in Python ecosystem - especially for
projects that are not as up-to-date as Airflow is with all our
dependencies - most of our dependecies are automatically updated in
the constraints as soon as new versions are released, and many of them
have binary packages already. So given how big of the problem it was
for some other projects, having just pymssql being problematic is
quite cool and shows that our approach works :).

Unfortunately we have no control over which version of Cython is used
when compiling PyMSSQL (this is something described by pymssql package
- and new versions of pip uses "build-isolation" enabled by default,
so it's only up to the package itself to decide on the version of
build tools that are used. There is a "mostly accidental" - I think -
workaround with PIP_CONSTRAINTS environment variable but it is rather
complexi-sh to pull, especialy in custom docker images based on the
slim images.

I've implemented the workaround for our ARM images last week to make
them work - so you can see it's quite a bit complex-ish:
https://github.com/apache/airflow/pull/32748

The 2.2.8 version of pymssql has only one change:

> Version 2.2.8 - 2023-07-30 - Mikhail Terekhov
> Compatibility with Cython. Thanks to matusvalo (Matus Valo) (fix #826).

Why 2.5+ ?

a) because  ARM suppport for MsSQL has been introduced in 2.5.1
b) because 2.4 used 2.2.5 version of PyMSSQL and there were few more
changes in 2.2.6 so there is a (low) risk it will break something
else.

Note, that we do NOT have to rebuild our images, when the pymssql
2.2.7 has been build before Cython 3.0.0, it is good to go. The fact
that 2.2.8 only change is to make it works with CPython to build - we
do not need to rebuild and re-release our images.

Can we get consensus on it? Anyone has anything against it ?

J.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to