Hello everyone, TL;DR; I have a small proposal - I would like to fix constraints for all Airflow 2.5.* and 2.6* by updating pymssql to (released today) 2.2.8 version from 2.2.7.
I still have to wait for the "complete" release. They have not yet released linux binary variants of the packages for 2.2.8 and people watching it flagged it to the maintainers. but I wanted to get consensus on it before I start doing it. Currently users installing MSSQL provider for their ARM-based airflow are experiencing "build failed" when pymssql is installed. They have to use a workaround described here - https://github.com/apache/airflow/issues/32672#issuecomment-1647007726 and the proposal aims to fix it so that the workaround will not be needed when using constraints. There are already few issues about it in our repo: This is one of the extremely rare cases (happened already 2 times over last 2 years) where our "reproducible installation" stopped working for Python versions - because of the `pip` tooling update that we have no control over, but thanks to ability of updating constraints, we can fix it by updating constraints. If we get consensus I will use that opportunity to add some tooling to make it easier to do such updates in the future - it requires creating new branch for every versio and moving constraint tags - but this is easy to automate. And I will have an excuse to develop a small tool do help with that that - which we will be able to use in the future in simillar cases (I've done it manually before). Some more context: Two weeks ago, on 17th of July, Cython released a long-in-the-making 3.0.0 version with some backwards-incompatible changes, and while a lot of the packages have been made compatible, pymssql was one of the packages that was not. The issue did not affect x86 users, because pymssql binaries were pre-compiled in PyPI https://pypi.org/project/pymssql/2.2.7 but ARM users have problems installing it, because it needs to be compiled on-the-flight for them. It caused quite a bit of mayhem in Python ecosystem - especially for projects that are not as up-to-date as Airflow is with all our dependencies - most of our dependecies are automatically updated in the constraints as soon as new versions are released, and many of them have binary packages already. So given how big of the problem it was for some other projects, having just pymssql being problematic is quite cool and shows that our approach works :). Unfortunately we have no control over which version of Cython is used when compiling PyMSSQL (this is something described by pymssql package - and new versions of pip uses "build-isolation" enabled by default, so it's only up to the package itself to decide on the version of build tools that are used. There is a "mostly accidental" - I think - workaround with PIP_CONSTRAINTS environment variable but it is rather complexi-sh to pull, especialy in custom docker images based on the slim images. I've implemented the workaround for our ARM images last week to make them work - so you can see it's quite a bit complex-ish: https://github.com/apache/airflow/pull/32748 The 2.2.8 version of pymssql has only one change: > Version 2.2.8 - 2023-07-30 - Mikhail Terekhov > Compatibility with Cython. Thanks to matusvalo (Matus Valo) (fix #826). Why 2.5+ ? a) because ARM suppport for MsSQL has been introduced in 2.5.1 b) because 2.4 used 2.2.5 version of PyMSSQL and there were few more changes in 2.2.6 so there is a (low) risk it will break something else. Note, that we do NOT have to rebuild our images, when the pymssql 2.2.7 has been build before Cython 3.0.0, it is good to go. The fact that 2.2.8 only change is to make it works with CPython to build - we do not need to rebuild and re-release our images. Can we get consensus on it? Anyone has anything against it ? J. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org