This post details the design that John Snow and I are planning for QEMU 8.1. The purpose is to detect possible inconsistencies in the build environment, that could happen on enterprise distros once Python 3.6 support is dropped.
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> --- v1->v2: add CSS for asciicast note that sphinx is already checked for now-enough Python some more copy-editing _posts/2023-03-22-python.md | 223 ++++++++++++++++++++++++++++++++++++ assets/css/style.css | 4 + 2 files changed, 227 insertions(+) create mode 100644 _posts/2023-03-22-python.md diff --git a/_posts/2023-03-22-python.md b/_posts/2023-03-22-python.md new file mode 100644 index 0000000..d463847 --- /dev/null +++ b/_posts/2023-03-22-python.md @@ -0,0 +1,222 @@ +--- +layout: post +title: "Preparing a consistent Python environment" +date: 2023-03-22 13:30:00 +0000 +categories: [build, python, developers] +--- +Building QEMU is a complex task, split across several programs. +configure finds the host and cross compilers that are needed to build +emulators and firmware; Meson prepares the build environment for the +emulators; finally, Make and ninja actually perform the build, and +in some cases they run tests as well. + +In addition to compiling C code, many build steps run tools and +scripts which are mostly written in the Python language. These include +processing the emulator configuration, code generators for tracepoints +and QAPI, extensions for the Sphinx documentation tool, and the Avocado +testing framework. The Meson build system itself is written in Python, too. + +Some of these tools are run through the `python3` executable, while others +are invoked directly as `sphinx-build` or `meson`, and this can create +inconsistencies. For example, QEMU's `configure` script checks for a +minimum version of Python and rejects too-old interpreters. However, +what would happen if code run by Sphinx used a different version? + +This situation has been largely hypothetical until recently; QEMU's +Python code is already tested with a wide range of versions of the +interpreter, and it would not be a huge issue if Sphinx used a different +version of Python as long as both of them were supported. This will +change in version 8.1 of QEMU, which will bump the minimum supported +version of Python from 3.6 to 3.8. While all the distros that QEMU +supports have a recent-enough interpreter, the default on RHEL8 and +SLES15 is still version 3.6, and that is what all binaries in `/usr/bin` +use unconditionally. + +As of QEMU 8.0, even if `configure` is told to use `/usr/bin/python3.8` +for the build, QEMU's custom Sphinx extensions would still run under +Python 3.6. configure does separately check that Sphinx is executing +with a new enough Python version, but it would be nice if there were +a more generic way to prepare a consistent Python environment. + +This post will explain how QEMU 8.1 will ensure that a single interpreter +is used for the whole of the build process. Getting there will require +some familiarity with Python packaging, so let's start with virtual +environments. + +## Virtual environments + +It is surprisingly hard to find what Python interpreter a given script +will use. You can try to parse the first line of the script, which will +be something like `#! /usr/bin/python3`, but there is no guarantee of +success. For example, on some version of Homebrew `/usr/bin/meson` +will be a wrapper script like: + +```bash +#!/bin/bash +PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" \ + exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@" +``` + +The file with the Python shebang line will be hidden somewhere in +`/usr/local/Cellar`. Therefore, performing some kind of check on the +files in `/usr/bin` is ruled out. QEMU needs to set up a consistent +environment on its own. + +If a user who is building QEMU wanted to do so, the simplest way would +be to use Python virtual environments. A virtual environment takes an +existing Python installation but gives it a local set of Python packages. +It also has its own `bin` directory; place it at the beginning of your +`PATH` and you will be able to control the Python interpreter for scripts +that begin with `#! /usr/bin/env python3`. + +Furthermore, when packages are installed into the virtual environment +with `pip`, they always refer to the Python interpreter that was used to +create the environment. Virtual environments mostly solve the consistency +problem at the cost of an extra `pip install` step to put QEMU's build +dependencies into the environment. + +Unfortunately, this extra step has a substantial downside. Even though +the virtual environment can optionally refer to the base installation's +installed packages, `pip` will always install packages from scratch +into the virtual environment. For all Linux distributions except RHEL8 +and SLES15 this is unnecessary, and users would be happy to build QEMU +using the versions of Meson and Sphinx included in the distribution. + +Even worse, `pip install` will access the Python package index (PyPI) +over the Internet, which is often impossible on build machines that +are sealed from the outside world. Automated installation of PyPI +dependencies may actually be a welcome feature, but it must also remain +a strictly optional feature. + +In other words, the ideal solution would use a non-isolated virtual +environment, to be able to use system packages provided by Linux +distributions; but it would also ensure that scripts (`sphinx-build`, +`meson`, `avocado`) are placed into `bin` just like `pip install` does. + +## Distribution packages + +When it comes to packages, Python surely makes an effort to be confusing. +The fundamental unit for _importing_ code into a Python program is called +a package; for example `os` and `sys` are two examples of a package. +However, a program or library that is distributed on PyPI consists +of _many_ such "import packages": that's because while `pip` is usually +said to be a "package installer" for Python, more precisely it installs +"distribution packages". + +To add to the confusion, the term "distribution package" is often +shortened to _either_ "package" or "distribution". And finally, +the metadata of the distribution package remains available even after +installation, so "distributions" include things that are already +installed (and are not being distributed anywhere). + +All this matters because distribution metadata will be the key to +building the perfect virtual environment. If you look at the content +of `bin/meson` in a virtual environment, after installing the package +with `pip`, this is what you find: + +```python +#!/home/pbonzini/my-venv/bin/python3 +# -*- coding: utf-8 -*- +import re +import sys +from mesonbuild.mesonmain import main +if __name__ == '__main__': + sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) + sys.exit(main()) +``` + +This looks a lot like automatically generated code, and in fact it is; +the only parts that vary are the `from mesonbuild.mesonmain import main` +import, and the invocation of the `main()` function on the last line. +`pip` creates this invocation script based on the `setup.cfg` file +in Meson's source code, more specifically based on the following stanza: + +``` +[options.entry_points] +console_scripts = + meson = mesonbuild.mesonmain:main +``` + +Similar declarations exist in Sphinx, Avocado and so on, and accessing their +content is easy via `importlib.metadata` (available in Python 3.8+): + +``` +$ python3 +>>> from importlib.metadata import distribution +>>> distribution('meson').entry_points +[EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')] +``` + +`importlib` looks up the metadata in the running Python interpreter's +search path; if Meson is installed under another interpreter's `site-packages` +directory, it will not be found: + +``` +$ python3.8 +>>> from importlib.metadata import distribution +>>> distribution('meson').entry_points +Traceback (most recent call last): +... +importlib.metadata.PackageNotFoundError: meson +``` + +So finally we have a plan! `configure` can build a non-isolated virtual +environment, use `importlib` to check that the required packages exist +in the base installation, and create scripts in `bin` that point to the +right Python interpreter. Then, it can optionally use `pip install` to +install the missing packages. + +While this process includes a certain amount of +specialized logic, Python provides a customizable [`venv` +module](https://docs.python.org/3/library/venv.html) to create virtual +environments. The custom steps can be performed by subclassing +`venv.EnvBuilder`. + +This will provide the same experience as QEMU 8.0, except that there will +be no need for the `--meson` and `--sphinx-build` options to the +`configure` script. The path to the Python interpreter is enough to +set up all Python programs used during the build. + +There is only one thing left to fix... + +## Nesting virtual environments + +Remember how we started with a user that creates her own virtual +environment before building QEMU? Well, this would not work +anymore, because virtual environments cannot be nested. As soon +as `configure` creates its own virtual environment, the packages +installed by the user are not available anymore. + +Fortunately, the "appearance" of a nested virtual environment is easy +to emulate. Detecting whether `python3` runs in a virtual environment +is as easy as checking `sys.prefix != sys.base_prefix`; if it is, +we need to retrieve the parent virtual environments `site-packages` +directory: + +``` +>>> import sysconfig +>>> sysconfig.get_path('purelib') +'/home/pbonzini/my-venv/lib/python3.11/site-packages' +``` + +and write it to a `.pth` file in the `lib` directory of the new virtual +environment. The following demo shows how a distribution package in the +parent virtual environment will be available in the child as well: + +<script async id="asciicast-31xjLsR4KjsU9HuhOUpU08tvb" src="https://asciinema.org/a/31xjLsR4KjsU9HuhOUpU08tvb.js"></script> + +A small detail is that `configure`'s new virtual environment should +mirror the isolation setting of the parent. An isolated venv can be +detected because `sys.base_prefix in site.PREFIXES` is false. + +## Conclusion + +Right now, QEMU only makes a minimal attempt at ensuring consistency +of the Python environment; Meson is always run using the interpreter +that was passed to the configure script with `--python` or `$PYTHON`, +but that's it. Once the above technique will be implemented in QEMU 8.1, +there will be no difference in the build experience, but configuration +will be easier and a wider set of invalid build environments will +be detected. We will merge these checks before dropping support for +Python 3.6, so that users on older enterprise distributions will have +a smooth transition. diff --git a/assets/css/style.css b/assets/css/style.css index 2705787..983fb67 100644 --- a/assets/css/style.css +++ b/assets/css/style.css @@ -184,6 +184,10 @@ color: #999999; } + .asciicast { + width: 45em; + } + /* Sections/Articles */ section, -- 2.39.2