Hi, I'm done with my +1 week (started later, ended later - still a week) and wanted to summarize what happened. As usual I'm not talking about all the minor trigger-here/trigger-there that would be a waste of virtual ink. But some cases were more interesting than that and those I wanted to mention here especially in case that others have hit the same or later will follow on on these cases.
TL;DR: - unblocked libzip transition - fix openscad test fail - fix mysql FTBFS @rsicv64 - fix php-easyrdf test race - unblocked spyder transition - fixed spyder-memory-* tests - had spyder-requests removed - unblocked the opencascade transition - fix freecad OOM @ s390x - analyzed netgen fail in depth - reported the issue upstream - based on the analysis Locutus uploaded a test skip@armhf - resolved iperf 2.0.14a breaking tests - analyzed the case affecting mininet/openvswitch - eventually had the new version removed from hirsute-proposed - upstream bugs filed to resolve this for 21.10 - resolved mininet 2.3.0 breaking tests - analyzed the case and uploaded a new vswitch - new vswitch and mininet migrated into hirsute-relase now - resolved some more uncommon build/tests issues - psmisc done (lost test result) and now migrated into -release - ddd done (toolchain issue that now is resolved on rebuild) and migrated There are 5 more issues left of the "rebuild for fixed permission" set 2 FTBFS and 3 test fails; one should continue on those. Maybe a good candidate for further +1? Much much more details of the same below ... #1 openscad openscad/2021.01-1 autopkgtests never worked. Bad on First Februaray and up to now. 1072 - pdfexporttest_centered (Failed) All good tests are wit the old version 2019.05-5 This package is entangled with libzip which blocks quite a bunch of others. So unblocking this would help proposed more than just for this package. Works fine in Debian Ci: https://ci.debian.net/data/autopkgtest/testing/amd64/o/openscad/10926296/log.gz Checked a local VM based repro as-is and all-proposed. Both failed. Then followed a long strange trip which eventually led to pkgstripfiles/optipng breaking the test data files See https://bugs.launchpad.net/ubuntu/+source/openscad/+bug/1918445 for more details. I uploaded a fix to hirsute and submitted it to Debian (a no-op for them) to later on be able to sync it. #2 I was pinged that bug 1915275 FTBFSing mysql on riscv64 also blocks libzip Indeed since glibc was forced into release without resolving this issue it made things worse. Other than stated/assumed on the bug it isn't just breaking the tests the existing mysql-8.0 on riscv64 now (with new glibc) fails to even install the package. Thereby essentially all of `reverse-depends --release hirsute --build-depends src:mysql-8.0` are FTFBS on riscv64 now and php7.4 is just one of many - only difference now blocking the libzip transition. After a longer debug session I've found that the return value of sysconf wasn't handled properly and thereby breaking the allocation at unsigned long(-1) size. I've filed a bug and submitted a fix upstream, as well as an MP for the packaging to resolve this in Hirsute asap. Over the weekend I tracked that these builds worked and rebuilt PHP, that aspect no more blocks php->libzip. #3 php-easyrdf This is a universe package and blocks on a rebuild not triggered by our Team. So the chances anyone looks for it without a ping were rather low. I did a triage seeing that the new version 1.0.0-2 slipped through with an aborting but considered ok test - but actually it never worked. A check with bryce showed that it wasn't a known case from the recent phpunit activities, but also nothing that the team currently looks at. So to unblock libzip I've taken a look at it. It turned out to be a race which I fixed and submitted to Debian but also something else that is yet unclear (not reproducible in s390x canonistack, but failing in autopkgtest). For now a retry-frenzy seems to resolve the issue sometimes, but since all tests are marked as "superficial" the state "neutral" is the best one can achieve. The underlying issue seems to be a race in the php server init and the test trying to use it. I have added some hardening against that case and debugged it with s390x runs of autopkgtest-infra against the PPA. I have submitted this to Debian, but since the problem isn't present there (but could happen at any time) it isn't urgent to them and we can't wait. So I uploaded an ubuntu1 version to unblock things in hirsute. To have this visible for others looking at excuses I also filed: https://bugs.launchpad.net/ubuntu/+source/php-easyrdf/+bug/1919125 #5 request-tracker5 I have seen many packages B-D on request-tracker5 but found that this would actually be a transition and therefore it is good to be held back in -proposed until we open for 21.10 (Then there is a build time test fail that needs to be resolved). Do we want/need to do more to hold it back and not be fixed by accident? Should we remove that from -proposed to avoid that and clear the view a bit - or would this make alter re-syncing harder? I didn't reach a full conclusion on this one as other items made more progress. If an AA has a strong opinion on "yeah remove it" then please feel free to do so. #6 Spyder spyder has a new major version 4.x which causes autopkgtest fails => https://launchpad.net/ubuntu/+source/spyder/4.2.1+dfsg1-3 One dependency of these packages has a new version in proposed and we need to test against that - I've done it and it resolved. The other one is incompatible with 4.x and removed in testing => https://tracker.debian.org/news/1235339/spyder-reports-removed-from-testing/ IMHO we'd want to do the same, so I pinged AAs to help me with that and after removal the rest migrated fine. #7 Freecad freecad fails on the autopkgtests on s390x https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/s390x/f/freecad/20210310_103548_24da1@/log.gz This is reproducible on canonistack. This fails on Debian CI @s390x just as much https://ci.debian.net/data/autopkgtest/testing/s390x/f/freecad/10997677/log.gz Seems that the new version just isn't working fine on s390x, needs some debugging to decide between a fix or resetting the tests (TBH cad @ s390x isn't a really important thing). I found Brian has assessed the same: https://bugs.launchpad.net/ubuntu/+source/freecad/+bug/1918474 The obvious thought is "mark it as big test" but I at least wanted the confirmation that it then would work, so I spawned a few s390x Hirsute guests of different sizes. This is also tied in the opencascade migration which I looked at next. A day later Debian already accepted my changes and uploaded this together with an upstream fix release - this LGTM and didn't need an FFe so we can sync this to unblock the issue at hand. #8 Opencascade / netgen After unblocking freecad I found that it was entangled with opencascade. And other than freecad it was also blocked on netgen that had a build fail on armhf. There already was a bug report about it under https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984439 But not much progress on a real solution to it. Also I'm not 100% sure that the Debian report I've found is indeed the very same issue we face in our builds atm: test_pickling.py Fatal Python error: Bus error ... Bus error (core dumped) In Debian the error does not happen https://buildd.debian.org/status/package.php?p=netgen So maybe this is a new case by glibc/gcc/.. .being newer? The last successful build was in November 2020, there the toolchain was quite different than today. OTOH the most recent upload has a change like "[5426125] Fix running tests" and tests are what breaks, so maybe it wasn't run before at all. In arm64 canonistack + armhf LXD this reproduces just fine. $ export PYTHONPATH="$PYTHONPATH:/root/netgen-6.2.2006+really6.2.1905+dfsg/debian/tmp/usr/lib/python3/dist-packages" $ apt install python3-tk python3-numpy $ cd ~/netgen-6.2.2006+really6.2.1905+dfsg/tests/pytest $ LD_LIBRARY_PATH=/root/netgen-6.2.2006+really6.2.1905+dfsg/debian/tmp/usr/lib/$DEB_HOST_MULTIARCH python3 -m pytest -k test_pickling -s ... test_pickling.py Bus error (core dumped) The other tests pass test_pickling.py::test_pickle_stl PASSED test_pickling.py::test_pickle_occ PASSED test_pickling.py::test_pickle_geom2d PASSED test_pickling.py::test_pickle_mesh PASSED Just test_pickle_csg fails. And in this test the failing line is: geo_dump = pickle.dumps(geo) With geo being <netgen.libngpy._csg.CSGeometry object at 0xf6da99b0> Running that in python3-dbg and gdb into the core file shows the pickling deep into netgen's code (which is better than a generic pickling issue I guess) #0 0xf659c99e in ngcore::BinaryOutArchive::Write<double> (x=10000000000, this=0xffa90cc4) at ./libsrc/stlgeom/../general/../core/archive.hpp:732 #1 ngcore::BinaryOutArchive::operator& (this=0xffa90cc4, d=@0x26aa6d8: 10000000000) at ./libsrc/stlgeom/../general/../core/archive.hpp:681 #2 0xf641d4de in netgen::Surface::DoArchive (archive=..., this=0x26aa6d0) at ./libsrc/csg/surface.hpp:68 #3 netgen::OneSurfacePrimitive::DoArchive (archive=..., this=0x26aa6d0) at ./libsrc/csg/surface.hpp:344 #4 netgen::QuadraticSurface::DoArchive (this=0x26aa6d0, ar=...) at ./libsrc/csg/algprim.hpp:52 #5 0xf641dc00 in netgen::Sphere::DoArchive (this=0x26aa6d0, ar=...) at ./libsrc/csg/algprim.hpp:151 #6 0xf6434c28 in ngcore::Archive::operator&<netgen::Surface, void> (val=..., this=0xffa90cc4) at ./libsrc/csg/../general/../core/archive.hpp:307 #7 ngcore::Archive::operator&<netgen::Surface> (this=this@entry=0xffa90cc4, p=@0x2727718: 0x26aa6d0) at ./libsrc/csg/../general/../core/archive.hpp:490 #8 0xf6430dca in ngcore::Archive::Do<netgen::Surface*, void> (n=<optimized out>, data=<optimized out>, this=0xffa90cc4) at ./libsrc/csg/../general/../core/archive.hpp:280 #9 ngcore::Archive::operator&<netgen::Surface*> (v=std::vector of length 32, capacity 32 = {...}, this=0xffa90cc4) at ./libsrc/csg/../general/../core/archive.hpp:209 #10 ngcore::SymbolTable<netgen::Surface*>::DoArchive<netgen::Surface*> (ar=..., this=0x2843c64) at ./libsrc/csg/../general/../core/symboltable.hpp:44 #11 ngcore::Archive::operator&<ngcore::SymbolTable<netgen::Surface*>, void> (val=..., this=0xffa90cc4) at ./libsrc/csg/../general/../core/archive.hpp:307 #12 netgen::CSGeometry::DoArchive (this=0x2843c60, archive=...) at ./libsrc/csg/csgeom.cpp:329 #13 0xf648a958 in ngcore::Archive::operator&<netgen::CSGeometry, void> (val=..., this=0xffa90cc4) at ./libsrc/csg/../general/../core/archive.hpp:305 #14 ngcore::Archive::operator&<netgen::CSGeometry> (this=this@entry=0xffa90cc4, p=@0xffa90ba4: 0x2843c60) at ./libsrc/csg/../general/../core/archive.hpp:518 #15 0xf64a4218 in ngcore::NGSPickle<netgen::CSGeometry, ngcore::BinaryOutArchive, ngcore::BinaryInArchive>()::{lambda(netgen::CSGeometry*)#1}::operator()(netgen::CSGeometry*) const ( self=<optimized out>, this=<optimized out>) at /usr/include/pybind11/pytypes.h:199 ... ./libsrc/stlgeom/../general/../core/archive.hpp:732 is *reinterpret_cast<T*>(&buffer[ptr]) = x; // NOLINT With: (gdb) p &buffer $5 = (std::array<char, 1024> *) 0xffa90d40 (gdb) p ptr $3 = 1 Depending on how the real code (not gdb) interprets this pointer addition that might explain the sigbus as it reflects unaligned access and if it adds that up to just "0xffa90d41" (which happens in gdb) then it fails. Debugging this deeper without context knowledge will be messy. Maybe I can identify a related toolchain issue or workaround. So I built it with gcc-9 and gcc-11 (as it worked in November), but both builds behaved the same way. I checked the older builds, they just worked because they didn't run the tests. So it was broken all along but now is an FTBFS. I'm a bit lost here, I doubt I'll be very effective going deeper into the hpp code that defines this. Instead I think I have collected quite some logs and insights and filed an upstream bug (to discuss/resolve this) as well as a launchpad bug so that it is visible as an update-excuse. => https://bugs.launchpad.net/ubuntu/+source/netgen/+bug/1919335 => https://github.com/NGSolve/netgen/issues/89 There was no response on this in the days that I was actively on +1 duty, anyone looking at the same case later is recommended to take a look at the current state of these bugs/discussions. A day later Locutus (Thanks!) also looked at the same and agreed, to resolve the issue for now he made armhf to be "dh_auto_test || true". So opencascade will resolve once all that is complete. # 9 openvswitch test fails I'm a bit familiar with this but OTOH it is nothing I'd usually look after (unless I did an upload) as this is mostly at home with the openstack team, but seeing things block on it for 28 and 136 days indicates this will stay broken unless someone takes a look. Uploads of a new `mininet` as well as a new `iperf` fail to test against this on all architectures. It seems a few people already hit retry on this, but there was no bug or documentation about it yet. It has three sub-tests of which only one called "vanilla" breaks. The new mininet makes it "fail" and iperf makes it time out. I was retrying this on autopkgtest-infra (no queues atm) and in a local VM once with and once without the new packages for further debugging. Issues: - with all-proposed python2 issues calling python2 - with the new iperf - with the new mininet - as-is it fails with RTNETLINK answers: File existsrface pair (s1-eth1,h1-eth0) Solution: - mininet 2.3 switched from py2 only to py3 only - adapt test dependencies and python calls in d/t/* - that also resolved the "existsrface pair" issues - iperf 2.0.14 is actually an alphy of 2.1 and has massive changes - It is long enough in proposed for the FF, but 2.1.1 would maybe be better - neither the new nor the old mininet is compatible with this yet - I'd hat to have the new iperf enter hirsute now and break all kinds of automated tests where it is often used. - IMHO this iperf build shall be removed and in 21.10 has a new chance even better as 2.1 or 2.1.1 then I debugged the iperf/mininet incompatibility a bit and filed a bug to resolve that mid-term. => https://github.com/mininet/mininet/issues/1060 For Hirsute I filed a removal bug for the new iperf version => https://code.launchpad.net/~paelzer/ubuntu/+source/openvswitch/+git/openvswitch/+merge/399771 James page reviewed, merged and uploaded that And for openvswitch/mininet I opened an MP that will resolve it => https://bugs.launchpad.net/ubuntu/+source/iperf/+bug/1919432 Some builds and test re-triggers later all those were resolved. The new openvswitch tested fine and migrated into hirsute-release, then a test retrigger later also mininet was working and ready. Furthermore there were some follow ups in the upstream discussion. It seems that in 21.10 the (then) newer mininet should be compatible with the new iperf. # 10 psmisc Being such a core package I was wondering that this hung in excuses for 28 days already. I found that it had a test at armhf "lost". It wasn't failed or passed, just non existing and from britney's POV it was waiting for the test result. I guess we'd want that new (minor) upstream release in Hirsute so I had a look. After re-issuing the test is succeeded and this became ready to migrate. #11 ddd This had build errors, but actually is an important rebuild from the hiccup that had created wrong permissions in built packages. Gladly this was a toolchain issue back then and now is resolved. => https://launchpad.net/ubuntu/+source/ddd/1:3.3.12-5.3build1 It might be worth to note that there are a few others left of that rebuild-burst that are still stuck in one or the other way: FTFBS: - https://launchpad.net/ubuntu/+source/clisp/1:2.49.20180218+really2.49.92-3build5 - https://launchpad.net/ubuntu/+source/nng/1.4.0-1build1 Test fails: - https://launchpad.net/ubuntu/+source/gnome-activity-journal/1.0.0-3build1 - https://launchpad.net/ubuntu/+source/php-apcu/5.1.19+4.0.11-3build1 - https://launchpad.net/ubuntu/+source/ruby-httpclient/2.8.3-2build1 I was out of time, but I'd guess that one also should look after those? -- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel