> On Feb 22, 2017, at 10:16 AM, Phil Mayers <p.may...@imperial.ac.uk> wrote:
> 
> On 22/02/17 17:42, Hynek Schlawack wrote:
> 
>> I have to disagree here:  I don’t want build tools of any kind in my
>> final containers therefore I build my artifacts separately no matter
>> what language.  Of course you can just build the venv on your build
> 
> Agreed, 100%. Apologies if I gave you the impression I was advocating 
> otherwise.
> 
>> server without wheeling up a temporary container and then package it
>> using Docker or DEB or whatever.  You should be separating building
>> and running anyway so Python – as much as I’d like Go-style single
>> binaries too – is in no way special here.  The nice thing about
>> temporary containers though is that I can do all of that on my Mac.
> 
> I agree that you need to separate building and installation, and I've got no 
> particular beef with using a container, chroot, throwaway VM or whatever 
> works for people in doing the build phase.
> 
> (What people do with the resultant build output - and in particular whether 
> there is a lot of ignoring of the hard-learned lessons of system package 
> managers going on now - I will not comment on ;o)
> 
> What I was trying to say - badly, apparently - was that the system python 
> *could* be attractive to someone because many dependencies may exist in the 
> OS package list in suitable form, but conversely may not exist in PyPI in 
> binary form for Linux.

Yes, and building these binary artifacts is often harder than some people 
(cough, alpine, cough) seem to think.  But there are better ways to square this 
circle than restricting yourself to the versions of python libraries that 
happen to be available in your distro.

> As a very simple example: if you have a traditional (non-container) Linux 
> system hosting a Python application in a virtualenv, and you deploy a Python 
> app to a virtualenv e.g. using Puppet or Ansible, you either need to:
> 
> 1. Use no C extensions
> 2. Hope there's a manylinux1 binary wheel
> 3. Use the OS package and --system-site-packages
> 4. Compile the C extensions and make them available to pip
> 
> #2 seems useful now that I know about it but - correct me if I'm wrong - the 
> manylinux1 permitted C dependencies are super-tiny, and would not permit e.g. 
> cryptography or psycopg2?

Cory already pointed this out tangentially, but I should emphasize: 
'cryptography' and 'psycopg2' are things that you depend on at the Python 
level.  The things you depend on at the C level are libssl, libcrypto, and 
libpq.  If you want to build a manylinux wheel, you need to take this into 
account and statically link those C dependencies, which some projects are 
beginning to do.  (Cryptography _could_ do this today, they already have the 
infrastructure for doing it on macOS and Windows, the reason they're not 
shipping manylinux1 wheels right now has to do with the political implications 
of auto-shipping a second copy of openssl to Linux distros that expect to 
manage security upgrades centrally).

> #4 is what you are advocating for I believe? But can we agree that for 
> smaller projects, that might seem like a lot of repeated work if the package 
> is already available in the OS 

If you're going to do #4 with dh_virtualenv, your .deb can depend on the 
relevant packages that contain the relevant C libraries, and build linux 
wheels, which are vendor-specific and can dynamically link to whatever you want 
(i.e. not manylinux wheels, which are vendor-neutral and must statically link 
everything).  Manylinux wheels are required for uploading to PyPI, where you 
don't know who may be downloading - on your own infrastructure, where you are 
shipping inside an artifact (like a .deb) that specifically has metadata 
describing its dependencies, "linux" wheels are fine.  Alone, hanging around on 
PyPI as .whl files rather than as .debs in your infrastructure, they'd be 
mystery meat, but that is not the case if they have proper dependency metadata.

It might seem weird to use Python-specific tooling and per-application 
vendoring for Python dependencies, and yet use distro-global dynamic linking 
for C dependencies.  But, this is actually a perfectly cromulent strategy, and 
I think this bears a more in-depth explanation.

C, and particularly the ecosystem of weird dynamic linker ceremony around C, 
has an extremely robust side-by-side installation ecosystem, which distros 
leverage to great effect.  For example, on the Ubuntu machine sitting next to 
me as I write this, I have libasan0 (4.8.5) libasan1 (4.9.4) libasan2 (5.4.1) 
*and* libasan3 (6.2.0) installed, and this isn't even a computer with a 
particularly significant amount of stuff going on!  Nothing ever breaks and 
loads the wrong libasan.N.

Python, by contrast, tried to do this in a C-ish way, but that attempt resulted 
in this mess: https://packaging.python.org/multi_version_install/ 
<https://packaging.python.org/multi_version_install/>, which almost nobody 
uses.  Right at the top of that document, "For many use cases, virtual 
environments address this need without the complication ...".

Even if you are 100%, completely bought into a distro-style way of life, no 
containers at all, everything has to be in a system package to get installed, 
virtualenvs still make more sense than trying to sync up the whole system's 
Python library versions.

The reason nobody ever went back and tried to do multi-version installs "right" 
with Python is that the Python and C library ecosystems are fundamentally 
different in a bunch of important ways.  For one thing, Python libraries have 
no such thing as an enforceable ABI, so coupling between libraries and 
applications is much closer than in C.  For another, no SOVERSION.  Also, many 
small C utilities (the ones that would be some of the smaller entries in 
requirements.txt in a Python app) are vendored in or statically linked in 
applications, so the "dependency management" happens prior to the container 
build, in the source repo of the upstream, where it is hidden.  Python 
dependencies often have a far higher rate of churn than C dependencies because 
of the ease of development, which means both more divergence between required 
versions for different applications, and more benefits to being up-to-date for 
the applications that do rev faster.

Finally, the build process for Python packages is much simpler, since they're 
usually treated as archives of files that move around, rather than elaborate 
pre-build steps that are often required for C libraries to make sure everything 
is smashed into the .so at build time.

So think of your Python libraries as "vendored in" to your package for these 
reasons, rather than depended upon in the OS, and then participate in the 
broader distro (i.e. "C") ecosystem by building wheels that dynamically link 
whatever distro-level dependencies they need to.

> Wondering out loud, I guess it would be possible for OS-compiled python 
> extensions to be somehow virtualenv or relocation-compatible. One could 
> envisage something like:
> 
> virtualenv t
> . t/bin/activate
> pip syspkg-install python-psycopg2
> 
> ...and this going off and grabbing the OS-provided dependency of that name, 
> extracting it, and deploying it into the virtualenv, rather than the system 
> Python.

This is sort of what dh_virtualenv is.  It doesn't set up the mapping for you 
automatically, but you can pretty quickly figure out that python-psycopg2 
build-depends: libpq-dev.

> There are doubtless all sorts of reasons that is not practical.

The main thing is just that you have to decide on a standard format for 
distro-specific metadata, and then go encode it everywhere.  I'm pretty sure 
that distutils-sig would be open to codifying such an extension to the list of 
standardized metadata fields, so that tools can use it.

> Anyway, to be clear - I'm not advocating using the system Python. I'm trying 
> to explain why, based on the efforts we expend locally, it could seem 
> attractive to smaller sites.

To be even clearer, using the system python is fine - it's using the global 
python environment that has the most significant problem.

(Although, of course, the "system python" is probably CPython, and in most 
cases you want to be using PyPy, right?  So yeah don't use the system Python.)

I hope this explanation was helpful to those of you deploying with distro 
tooling!

-glyph
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to