Hi, Earlier this week, I removed the PerlIO layers for automatic UTF-8 conversion from Lintian and the test suite. The changes in this and the following commit:
https://salsa.debian.org/lintian/lintian/-/commit/a5584fcc6e4c14723006d2f552834d29c2ed314d The commits are comprehensive in the sense that they make most conversions to and from UTF-8 explicit. The rationale is outlined in Bug#972878.The underlying Perl bugs have been unsolved since 2007 and 2011. The remedy we ultimately adopted was suggested by the good folks on #perl-help. Due to the broad impact of the changes, which took over a thousand edits, there was concern about introducing new bugs. We do not yet have the ability to explore differences between Lintian versions in the archive (which may be possible with the new website at some point) so for now, I only looked at program errors caused by ill-formed UTF-8 octet sequences. They arose in three instances. In two cases, upstream sources shipped scripts with hashbang (#!) interpreters in non UTF-8 characters. Since Debian uses UTF-8 for file names, these scripts cannot run in Debian (and, for files being shipped as installable, the non UTF-8 encoding would also be flagged by Lintian) but there is nothing a Debian maintainer can do. The conversion was disabled here: https://salsa.debian.org/lintian/lintian/-/commit/86997a883d101662fff8e49e844d7b496e0b39e4 It affected the following two sources: the file 'szotar/szoszablya/ragozatlan.2' in magyarispell_1.6.1-2.dsc and the file 'tests/d2/dmd-testsuite/compilable/test13512.d' (a test file) in ldc_1.24.0-1.dsc. Those errors are now gone. A more difficult (and still unresolved) issue arose in the installable debug package libc6-dbg_2.31-5_amd64.deb. Debug packages are generated by Debian, and should therefore be clean. The file 'usr/lib/debug/.build-id/a2/78dac1d4a7d4aaf37f8c21dba517e3b68663c5.debug' produces readelf output that is not clean. It can be reproduced with this command: readelf --wide --segments --dynamic --section-details --symbols --version-info Readelf by itself does not guarantee output in UTF-8, but it should produce nothing else as a result of other restrictions in Debian. Perhaps most significantly, this file—a set of generated debug sybols—is literally the only file in our archive that trips up this error. According to readelf, the file requests a non-intelligible interpreter: INTERP 0x001000 0x0000000000193f00 0x0000000000193f00 0x000000 0x00001c R 0x10 [Requesting program interpreter: ���Gb�U3��T��Aopx��a�F?T�e��6�UE?�,y;���?X��A�?�߮��k��?����] Due to the unique nature of the error, and the garbage potentially provided to readelf there is a presumption that the debug file was created incorrectly, and was caused by a bug elsewhere. The file currently produces several program errors like this: Warning in group glibc/2.31-5: Can't decode ill-formed UTF-8 octet sequence <FF> in position 10050 at ./lib/Lintian/Index/Objdump.pm line 84. I plan to follow up with the maintainer of gcc and objcopy, which created the debug file, once I figure out whom to approach. On a positive note, the UTF-8 changes discussed here are expected to help greatly with the resolution of the open bug for UTF-8 file names, Bug#956233. Kind regards Felix Lechner