On 10/12/23 at 12:10 +1100, Stuart Prescott wrote:
> Package: qa.debian.org
> Severity: normal
> X-Debbugs-Cc: [email protected]
>
> The 'maintainer' and 'maintainer_email' columns of the upload_history table
> in UDD have truncated email addresses. Somewhere the 'maintainer' data
> is being truncated and then the maintainer_email is consequently broken.
>
> udd=> SELECT maintainer, maintainer_email FROM upload_history WHERE
> maintainer_email LIKE '%=' LIMIT 10;
> maintainer |
> maintainer_email
> ----------------------------------------------------------------+----------------------------------------------
> Maintainers of GStreamer packages <pkg-gstreamer-maintainers@= |
> pkg-gstreamer-maintainers@=
> Maintainers of GStreamer packages <pkg-gstreamer-maintainers@= |
> pkg-gstreamer-maintainers@=
> Zenoss Packaging Team <[email protected]= |
> [email protected]=
> Debian GNOME Maintainers <[email protected].= |
> [email protected].=
> Debian Perl Group <[email protected]= |
> [email protected]=
> Debian VoIP Team <[email protected]= |
> [email protected]=
> Debian Python Modules Team <[email protected].= |
> [email protected].=
> Debian Python Modules Team <[email protected].= |
> [email protected].=
> Debian Firebird Group <[email protected]= |
> [email protected]=
> Debian Samba Maintainers <[email protected]= |
> [email protected]=
> (10 rows)
>
>
> The input data from the d-d-c mailing list looks fine in the web archive,
> but I can imagine this being due to linewrappig in the mbox files.
>
> Looking at one specific example:
>
> https://lists.debian.org/debian-devel-changes/2007/12/msg00466.html
>
> udd=> SELECT maintainer, maintainer_email FROM upload_history WHERE
> maintainer_email LIKE '%=' AND source = 'libxml-rss-perl' AND version =
> '1.31-3';
> maintainer | maintainer_email
> ----------------------------------------------------------------+---------------------------------------------
> Debian Perl Group <[email protected]= |
> [email protected]=
> (1 row)
>
> This particular example is quite old but the problem also exists in
> recent uploads; as of writing the most recent one is libgetdata (0.11.0-9)
> that was uploaded today.
>
> Of the 850k rows in upload_history, this data issue is in 70k of them.
Hi,
I did some changes to the email decoding that solved most cases. We are
down to 1162 badly processed emails (from the 70k you reported):
udd=> SELECT count(*) FROM upload_history WHERE maintainer_email LIKE '%=';
count
-------
1162
They are all since 2022-08-27, which coincides with dak adding a
detached signature. So there might still be something to fix in the code
for that case.
udd=> select source, version, date from upload_history where maintainer_email
LIKE '%=' order by date asc limit 10;
source | version | date
----------------------------+---------------+------------------------
libsweble-common-java | 3.0.8-3 | 2022-08-27 20:49:34+00
xeus | 2.4.0-2 | 2022-08-27 20:49:43+00
systemd | 251.4-3 | 2022-08-27 22:05:51+00
cross-toolchain-base-ports | 53 | 2022-08-28 10:04:10+00
opencascade | 7.6.3+dfsg1-3 | 2022-08-28 10:36:28+00
wvkbd | 0.10-1 | 2022-08-28 10:36:40+00
gobject-introspection | 1.73.0+ds-1 | 2022-08-28 10:49:10+00
yade | 2022.01a-11 | 2022-08-28 11:05:40+00
ruby-em-http-request | 1.1.7-1 | 2022-08-28 12:29:29+00
ruby-rails-i18n | 7.0.5-1 | 2022-08-28 14:51:31+00
(10 rows)
Lucas