UDD contains names where spaces are not stripped

2023-12-07 Thread Andreas Tille
Hi,

by chance I realised that the uploaders table contains some names where names
are not stripped:

udd=> select '"' || u.name || '"' as name_with_spaces, uploader from uploaders 
u where name like '% ' or name like ' %' ;
 name_with_spaces | uploader  
--+---
 " Mehdi Dogguy"  |  Mehdi Dogguy 
 " David Paleino" |  David Paleino 
 " Stéphane Glondu"  |  Stéphane Glondu 
 " Stefano Zacchiroli"|  Stefano Zacchiroli 
 " Stefano Zacchiroli"|  Stefano Zacchiroli 
 " Stefano Zacchiroli"|  Stefano Zacchiroli 
 " Stefano Zacchiroli"|  Stefano Zacchiroli 
 " Stefano Zacchiroli"|  Stefano Zacchiroli 
 "Andreas Tille  "| Andreas Tille   
 " LI Daobing"|  LI Daobing 
 " David Paleino" |  David Paleino 
 " Stefano Zacchiroli"|  Stefano Zacchiroli 
 " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
 " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
 " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
 " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
 " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
 "Colin Tuckley " | Colin Tuckley  
 "Colin Tuckley " | Colin Tuckley  
 "Colin Tuckley " | Colin Tuckley  
(20 rows)


This causes slight errors when counting uploads of people.  My guess is this
is due to some old importer code (I've checked the hit for my name which
is a pretty old upload).  Thus I wonder whether it might be the easiest
fix to simply fix this with some proper UPDATE statement to remove unneeded
spaces.  This statement is doing the trick in my local clone:

   UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' || 
email WHERE name like ' %' or name like '% ' ;

If I'm not misleaded historic uploads will not importet from scratch so
this would cure the situation.  Otherwise users need to always remember
adding some trim(name) when dealing with the uploaders.name column not
to mention that it gets even harder to deal with the uploader column
that might feature extra spaces in the middle.

What do you think?

Kind regards
Andreas.

-- 
http://fam-tille.de



Appstream data not in UTF-8?

2023-12-07 Thread Raphael Hertzog
Hello Matthias,

tracker.debian.org has been failing to import the appstream metadata for a
while (since November 26th) with this exception:

Task UpdateAppStreamStatsTask failed with the following traceback.

Traceback (most recent call last):
  File 
"/srv/tracker.debian.org/distro-tracker/distro_tracker/core/tasks/base.py", 
line 378, in run_task
task.execute()
  File 
"/srv/tracker.debian.org/distro-tracker/distro_tracker/core/tasks/base.py", 
line 256, in execute
call_methods_with_prefix(self, 'execute_')
  File 
"/srv/tracker.debian.org/distro-tracker/distro_tracker/core/utils/misc.py", 
line 44, in
call_methods_with_prefix
method(*args, **kwargs)
  File 
"/srv/tracker.debian.org/distro-tracker/distro_tracker/vendor/debian/tracker_tasks.py",
 line 889,
in execute_main
self._load_appstream_hint_stats(component, arch, all_stats)
  File 
"/srv/tracker.debian.org/distro-tracker/distro_tracker/vendor/debian/tracker_tasks.py",
 line 728,
in _load_appstream_hint_stats
hints_json = get_resource_text(url, force_update=self.force_update)
  File 
"/srv/tracker.debian.org/distro-tracker/distro_tracker/core/utils/http.py", 
line 356, in
get_resource_text
return content.decode(encoding)

Exception Type: UnicodeDecodeError
Exception Value: 'utf-8' codec can't decode byte 0xcc in position 990416: 
invalid continuation byte
Request data not supplied


Can you look into this and fix the data? (the precise position of the
error in the stream varied over the days, here you have the position as of
today, December 7th 09:36 UTC)

This refers to those two lines:
url = 'https://appstream.debian.org/hints/sid/{}/Hints-{}.json.gz' \
  .format(section, arch)
hints_json = get_resource_text(url, force_update=self.force_update)

arch is "amd64", an section is likely "main" but it's not impossible that
it's in one of the other sections.

Thank you in advance for your help!
-- 
  ⢀⣴⠾⠻⢶⣦⠀   Raphaël Hertzog 
  ⣾⠁⢠⠒⠀⣿⡁
  ⢿⡄⠘⠷⠚⠋The Debian Handbook: https://debian-handbook.info/get/
  ⠈⠳⣄   Debian Long Term Support: https://deb.li/LTS



Re: Appstream data not in UTF-8?

2023-12-07 Thread Raphael Hertzog
On Thu, 07 Dec 2023, Raphael Hertzog wrote:
> tracker.debian.org has been failing to import the appstream metadata for a
> while (since November 26th) with this exception:

Quick correction. The first time it failed that way was on November 19th
at 08:13 UTC.

Cheers,
-- 
  ⢀⣴⠾⠻⢶⣦⠀   Raphaël Hertzog 
  ⣾⠁⢠⠒⠀⣿⡁
  ⢿⡄⠘⠷⠚⠋The Debian Handbook: https://debian-handbook.info/get/
  ⠈⠳⣄   Debian Long Term Support: https://deb.li/LTS



Re: Appstream data not in UTF-8?

2023-12-07 Thread Andreas Tille
Am Thu, Dec 07, 2023 at 11:03:29AM +0100 schrieb Raphael Hertzog:
> On Thu, 07 Dec 2023, Raphael Hertzog wrote:
> > tracker.debian.org has been failing to import the appstream metadata for a
> > while (since November 26th) with this exception:
> 
> Quick correction. The first time it failed that way was on November 19th
> at 08:13 UTC.

I do not remember the time exactly but my gut feeling says this is pretty
close to the bookworm upgrade.

Kind regards
Andreas. 

-- 
http://fam-tille.de



Processed: some bug triaging

2023-12-07 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

> # to avoid these *.debian.org bugs showing up against the release
> # lets tag them as sid only (although technically that's not totally
> # correct either)
> tags 388141 = sid
Bug #388141 [www.debian.org] www.debian.org SPI copyright claim not legally 
valid until all contributors are contacted for relicensing
Removed tag(s) jessie, bullseye, bookworm, buster-ignore, stretch, trixie, and 
buster.
> tags 548024 sid
Bug #548024 [www.debian.org] packages.debian.org: mirror doesn't close old 
databases
Added tag(s) sid.
> tags 1037324 sid
Bug #1037324 [www.debian.org] d.o/CD/live refers to images no longer built
Added tag(s) sid.
> tags 1055043 sid
Bug #1055043 [qa.debian.org] Debian carnivore: port from Python 2 to 3
Added tag(s) sid.
> tags 1051661 sid
Bug #1051661 [ftp.debian.org] /usr/bin/qemu-system-ppc: Package not installable
Added tag(s) sid.
> tags 939147 sid
Bug #939147 [ftp.debian.org] gcc-7-source 7.4.0-11 vanished from the archive
Added tag(s) sid.
> tags 1023811 sid
Bug #1023811 [ftp.debian.org] 
golang-github-grpc-ecosystem-go-grpc-middleware_1.3.0-1 sources changed in the 
archive?
Added tag(s) sid.
> # apparently "sarge" isn't enough to not have it affect testing
> tags 497471 sid
Bug #497471 [cdimage.debian.org] sarge images have syslinux binaries without 
source
Added tag(s) sid.
> tags 507706 sid etch
Bug #507706 [cdimage.debian.org] Missing sources for d-i components/kernel of 
etch-n-half images
Added tag(s) etch and sid.
> # let's connect this
> affects 1025650 src:nmap
Bug #1025650 [ftp.debian.org] ftp.debian.org: Nmap Public Source License 
Version 0.94 - Is it DFSG-compliant?
Added indication that 1025650 affects src:nmap
> # might be better to add some words
> reassign 993274 release-notes
Bug #993274 [upgrade-reports] machine unbootable after upgrade
Bug reassigned from package 'upgrade-reports' to 'release-notes'.
Ignoring request to alter found versions of bug #993274 to the same values 
previously set
Ignoring request to alter fixed versions of bug #993274 to the same values 
previously set
> # seems like a (hopefully old) grub bug
> reassign 1006369 src:grub2 2.06-3~deb10u1
Bug #1006369 [upgrade-reports] mini-NAS v1 currently not system-upgradable to 
Buster
Bug reassigned from package 'upgrade-reports' to 'src:grub2'.
Ignoring request to alter found versions of bug #1006369 to the same values 
previously set
Ignoring request to alter fixed versions of bug #1006369 to the same values 
previously set
Bug #1006369 [src:grub2] mini-NAS v1 currently not system-upgradable to Buster
Marked as found in versions grub2/2.06-3~deb10u1.
> thanks
Stopping processing here.

Please contact me if you need assistance.
-- 
1006369: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006369
1023811: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1023811
1025650: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1025650
1037324: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1037324
1051661: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1051661
1055043: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1055043
388141: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=388141
497471: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=497471
507706: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=507706
548024: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=548024
939147: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=939147
993274: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993274
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Re: UDD contains names where spaces are not stripped

2023-12-07 Thread Lucas Nussbaum
On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> Hi,
> 
> by chance I realised that the uploaders table contains some names where names
> are not stripped:
> 
> udd=> select '"' || u.name || '"' as name_with_spaces, uploader from 
> uploaders u where name like '% ' or name like ' %' ;
>  name_with_spaces | uploader  
> --+---
>  " Mehdi Dogguy"  |  Mehdi Dogguy 
>  " David Paleino" |  David Paleino 
>  " Stéphane Glondu"  |  Stéphane Glondu 
>  " Stefano Zacchiroli"|  Stefano Zacchiroli 
>  " Stefano Zacchiroli"|  Stefano Zacchiroli 
>  " Stefano Zacchiroli"|  Stefano Zacchiroli 
>  " Stefano Zacchiroli"|  Stefano Zacchiroli 
>  " Stefano Zacchiroli"|  Stefano Zacchiroli 
>  "Andreas Tille  "| Andreas Tille   
>  " LI Daobing"|  LI Daobing 
>  " David Paleino" |  David Paleino 
>  " Stefano Zacchiroli"|  Stefano Zacchiroli 
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
>  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
>  "Colin Tuckley " | Colin Tuckley  
>  "Colin Tuckley " | Colin Tuckley  
>  "Colin Tuckley " | Colin Tuckley  
> (20 rows)
> 
> 
> This causes slight errors when counting uploads of people.  My guess is this
> is due to some old importer code (I've checked the hit for my name which
> is a pretty old upload).  Thus I wonder whether it might be the easiest
> fix to simply fix this with some proper UPDATE statement to remove unneeded
> spaces.  This statement is doing the trick in my local clone:
> 
>UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' || 
> email WHERE name like ' %' or name like '% ' ;
> 
> If I'm not misleaded historic uploads will not importet from scratch so
> this would cure the situation.  Otherwise users need to always remember
> adding some trim(name) when dealing with the uploaders.name column not
> to mention that it gets even harder to deal with the uploader column
> that might feature extra spaces in the middle.
> 
> What do you think?

Hi,

Uploaders is refreshed every few hours from archive data, so a one-time
UPDATE would not help. UDD usually tries to preserve inaccuracies, so
those might be interesting for QA work.
In your case, why don't you use the email address to identify uploaders?
(possibly combining it with the carnivore data to identify different emails
belonging to the same person ?)

Lucas



Re: UDD contains names where spaces are not stripped

2023-12-07 Thread Andreas Tille
Am Thu, Dec 07, 2023 at 07:59:38PM +0100 schrieb Lucas Nussbaum:
> On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> > Hi,
> > 
> > by chance I realised that the uploaders table contains some names where 
> > names
> > are not stripped:
> > 
> > udd=> select '"' || u.name || '"' as name_with_spaces, uploader from 
> > uploaders u where name like '% ' or name like ' %' ;
> >  name_with_spaces | uploader  
> > --+---
> >  " Mehdi Dogguy"  |  Mehdi Dogguy 
> >  " David Paleino" |  David Paleino 
> >  " Stéphane Glondu"  |  Stéphane Glondu 
> >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> >  "Andreas Tille  "| Andreas Tille   
> >  " LI Daobing"|  LI Daobing 
> >  " David Paleino" |  David Paleino 
> >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> >  "Colin Tuckley " | Colin Tuckley  
> >  "Colin Tuckley " | Colin Tuckley  
> >  "Colin Tuckley " | Colin Tuckley  
> > (20 rows)
> > ...
> >UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' || 
> > email WHERE name like ' %' or name like '% ' ;
> > 
> 
> Uploaders is refreshed every few hours from archive data, so a one-time
> UPDATE would not help. UDD usually tries to preserve inaccuracies, so
> those might be interesting for QA work.

OK.

> In your case, why don't you use the email address to identify uploaders?

Since this also does not work:

udd=> SELECT count(*), uploader FROM uploaders WHERE name ilike '%tille%' GROUP 
BY uploader;
 count |  uploader  
---+
 1 | Andreas Tille   
 1 | Andreas Tille 
  8785 | Andreas Tille 
(3 Zeilen)

> (possibly combining it with the carnivore data to identify different emails
> belonging to the same person ?)

I could fiddle around with carnivore but that's overkill for thst
purpose and I insist that not stripping blanks from names does not make
any sense, IMHO.  (1 Zeile)


BTW:  I found 

udd=> SELECT count(*), name FROM (SELECT CASE WHEN changed_by_name = '' THEN 
maintainer_name ELSE changed_by_name END AS name FROM upload_history) uh WHERE 
name ilike '%tille%'  group by name;
 count | name  
---+---
 16524 | Andreas Tille
(1 Zeile)

So why do I have 8707 uploads per uploaders but 16524 per upload_history?

Is my assumption wrong that both values should match (modulo some wrongly
spelled names)

Kind regards
Andreas.

-- 
http://fam-tille.de



Re: UDD contains names where spaces are not stripped

2023-12-07 Thread Lucas Nussbaum
On 07/12/23 at 20:24 +0100, Andreas Tille wrote:
> Am Thu, Dec 07, 2023 at 07:59:38PM +0100 schrieb Lucas Nussbaum:
> > On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> > > Hi,
> > > 
> > > by chance I realised that the uploaders table contains some names where 
> > > names
> > > are not stripped:
> > > 
> > > udd=> select '"' || u.name || '"' as name_with_spaces, uploader from 
> > > uploaders u where name like '% ' or name like ' %' ;
> > >  name_with_spaces | uploader  
> > > --+---
> > >  " Mehdi Dogguy"  |  Mehdi Dogguy 
> > >  " David Paleino" |  David Paleino 
> > >  " Stéphane Glondu"  |  Stéphane Glondu 
> > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > >  "Andreas Tille  "| Andreas Tille   
> > >  " LI Daobing"|  LI Daobing 
> > >  " David Paleino" |  David Paleino 
> > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > >  "Colin Tuckley " | Colin Tuckley  
> > >  "Colin Tuckley " | Colin Tuckley  
> > >  "Colin Tuckley " | Colin Tuckley  
> > > (20 rows)
> > > ...
> > >UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' 
> > > || email WHERE name like ' %' or name like '% ' ;
> > > 
> > 
> > Uploaders is refreshed every few hours from archive data, so a one-time
> > UPDATE would not help. UDD usually tries to preserve inaccuracies, so
> > those might be interesting for QA work.
> 
> OK.
> 
> > In your case, why don't you use the email address to identify uploaders?
> 
> Since this also does not work:
> 
> udd=> SELECT count(*), uploader FROM uploaders WHERE name ilike '%tille%' 
> GROUP BY uploader;
>  count |  uploader  
> ---+
>  1 | Andreas Tille   
>  1 | Andreas Tille 
>   8785 | Andreas Tille 
> (3 Zeilen)
> 
> > (possibly combining it with the carnivore data to identify different emails
> > belonging to the same person ?)
> 
> I could fiddle around with carnivore but that's overkill for thst
> purpose and I insist that not stripping blanks from names does not make
> any sense, IMHO.  (1 Zeile)
> 
> 
> BTW:  I found 
> 
> udd=> SELECT count(*), name FROM (SELECT CASE WHEN changed_by_name = '' THEN 
> maintainer_name ELSE changed_by_name END AS name FROM upload_history) uh 
> WHERE name ilike '%tille%'  group by name;
>  count | name  
> ---+---
>  16524 | Andreas Tille
> (1 Zeile)
> 
> So why do I have 8707 uploads per uploaders but 16524 per upload_history?
> 
> Is my assumption wrong that both values should match (modulo some wrongly
> spelled names)

If you look at the uploaders table, there are three columns:
- 'uploader', than contains the raw data
- 'name' and 'email' that contain the parsed (and trimmed) data

udd=> select uploader, name, email, count(*) from uploaders where uploader 
ilike '%tille%' group by 1,2,3;
  uploader  |  name   |  email   | 
count 
+-+--+---
 Andreas Tille| Andreas Tille   | ti...@debian.org |  8785
 Andreas Tille| Andreas Tille   | andr...@an3as.eu | 1
 Andreas Tille| Andreas Tille   | ti...@debian.org | 1

So, just use name and/or email?

Lucas



Re: UDD contains names where spaces and quotes are not stripped

2023-12-07 Thread Andreas Tille
Am Thu, Dec 07, 2023 at 08:36:12PM +0100 schrieb Lucas Nussbaum:
> On 07/12/23 at 20:24 +0100, Andreas Tille wrote:
> > Am Thu, Dec 07, 2023 at 07:59:38PM +0100 schrieb Lucas Nussbaum:
> > > On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> > > > 
> > > > udd=> select '"' || u.name || '"' as name_with_spaces, uploader from 
> > > > uploaders u where name like '% ' or name like ' %' ;
> > > >  name_with_spaces | uploader  
> > > > --+---
> > > >  " Mehdi Dogguy"  |  Mehdi Dogguy 
> > > >  " David Paleino" |  David Paleino 
> > > >  " Stéphane Glondu"  |  Stéphane Glondu 
> > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > >  "Andreas Tille  "| Andreas Tille   
> > > >  " LI Daobing"|  LI Daobing 
> > > >  " David Paleino" |  David Paleino 
> > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > >  "Colin Tuckley " | Colin Tuckley  
> > > >  "Colin Tuckley " | Colin Tuckley  
> > > >  "Colin Tuckley " | Colin Tuckley  
> > > > (20 rows)
> > > > ...
> > > >UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' 
> > > > || email WHERE name like ' %' or name like '% ' ;
> > > > 
> > 
> > 
> > BTW:  I found 
> > 
> > udd=> SELECT count(*), name FROM (SELECT CASE WHEN changed_by_name = '' 
> > THEN maintainer_name ELSE changed_by_name END AS name FROM upload_history) 
> > uh WHERE name ilike '%tille%'  group by name;
> >  count | name  
> > ---+---
> >  16524 | Andreas Tille
> > (1 Zeile)
> > 
> > So why do I have 8707 uploads per uploaders but 16524 per upload_history?

???

> > Is my assumption wrong that both values should match (modulo some wrongly
> > spelled names)

Could you please comment on these different results?
 
> If you look at the uploaders table, there are three columns:
> - 'uploader', than contains the raw data
> - 'name' and 'email' that contain the parsed (and trimmed) data
> 
> udd=> select uploader, name, email, count(*) from uploaders where uploader 
> ilike '%tille%' group by 1,2,3;
>   uploader  |  name   |  email   | 
> count 
> +-+--+---
>  Andreas Tille| Andreas Tille   | ti...@debian.org |  
> 8785
>  Andreas Tille| Andreas Tille   | andr...@an3as.eu |
>  1
>  Andreas Tille| Andreas Tille   | ti...@debian.org |
>  1
> 
> So, just use name and/or email?

Well, I do not seek for a solution for this (non-)problem.  I simply
think that not stripping values from spaces before injecting these into
UDD is wrong.  I simply stumbled upon this when I did the query above.

I stumbled upon another reason which might be even worse:

select distinct done, done_name, done_email, owner, owner_name, owner_email 
from archived_bugs where done_name like '%"%' or owner_name like '%"%' order by 
done_name;
 done   
   |  done_name 
 |   done_email
|  owner
  | owner_name  
| owner_email  
---+-+-+-+-+--
 
   |
 | der...@debian.org   
| "vane...@gmail.com"
  | "vane...@gmail.com" 
| vane...@gmail.com


Re: UDD contains names where spaces and quotes are not stripped

2023-12-07 Thread Lucas Nussbaum
On 08/12/23 at 07:39 +0100, Andreas Tille wrote:
> Am Thu, Dec 07, 2023 at 08:36:12PM +0100 schrieb Lucas Nussbaum:
> > On 07/12/23 at 20:24 +0100, Andreas Tille wrote:
> > > Am Thu, Dec 07, 2023 at 07:59:38PM +0100 schrieb Lucas Nussbaum:
> > > > On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> > > > > 
> > > > > udd=> select '"' || u.name || '"' as name_with_spaces, uploader from 
> > > > > uploaders u where name like '% ' or name like ' %' ;
> > > > >  name_with_spaces | uploader  
> > > > > --+---
> > > > >  " Mehdi Dogguy"  |  Mehdi Dogguy 
> > > > >  " David Paleino" |  David Paleino 
> > > > >  " Stéphane Glondu"  |  Stéphane Glondu 
> > > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > > >  "Andreas Tille  "| Andreas Tille   
> > > > >  " LI Daobing"|  LI Daobing 
> > > > >  " David Paleino" |  David Paleino 
> > > > >  " Stefano Zacchiroli"|  Stefano Zacchiroli 
> > > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > > >  " Nikita V. Youshchenko" |  Nikita V. Youshchenko 
> > > > >  "Colin Tuckley " | Colin Tuckley  
> > > > >  "Colin Tuckley " | Colin Tuckley  
> > > > >  "Colin Tuckley " | Colin Tuckley  
> > > > > (20 rows)
> > > > > ...
> > > > >UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' 
> > > > > ' || email WHERE name like ' %' or name like '% ' ;
> > > > > 
> > > 
> > > 
> > > BTW:  I found 
> > > 
> > > udd=> SELECT count(*), name FROM (SELECT CASE WHEN changed_by_name = '' 
> > > THEN maintainer_name ELSE changed_by_name END AS name FROM 
> > > upload_history) uh WHERE name ilike '%tille%'  group by name;
> > >  count | name  
> > > ---+---
> > >  16524 | Andreas Tille
> > > (1 Zeile)
> > > 
> > > So why do I have 8707 uploads per uploaders but 16524 per upload_history?
> 
> ???

upload_history contains all uploads ever made to Debian.
uploaders contains packages currently in the suite (not superseded by
another upload)

> > So, just use name and/or email?
> 
> Well, I do not seek for a solution for this (non-)problem.  I simply
> think that not stripping values from spaces before injecting these into
> UDD is wrong.  I simply stumbled upon this when I did the query above.

It has been like that for about 15 years. I'm not sure changing the API
because you think is wrong is a good idea.

> I stumbled upon another reason which might be even worse:
> 
> select distinct done, done_name, done_email, owner, owner_name, owner_email 
> from archived_bugs where done_name like '%"%' or owner_name like '%"%' order 
> by done_name;
>  done 
>  |  done_name 
>  |   done_email   
>  |  owner 
>  | 
> owner_name  | owner_email 
>  
> ---+-+-+-+-+--
>
>  |
>  | der...@debian.org  
>  | "vane...@gmail.com" 
>  | "vane...@gmail.com"
>  | vane...@gmail.com
>   
>  |
>  | twer...@debian.org 
>  | "Varun Hiremath"  
>  | "Varun Hiremath"   
>  | varunhirem...@gmail.co