On Tue, Jun 30, 2020 at 09:19:31AM +0200, Baptiste BEAUPLAT wrote: > On 6/29/20 11:34 PM, Raphael Hertzog wrote: > >> The duck worker has to process around 460000 urls (only counting > >> Homepage) in less than 24h. > > > > How do you get to that figure? We don't have that many source package > > and even if you consider multiple URL for each source package due to > > changes over time (in multiple releases), that makes way too many URLs > > per source package. > > Err, sorry about that. That figure is the result of: > > $ curl -s > http://deb.debian.org/debian/dists/unstable/main/source/Sources.gz | > zgrep -v Homepage: | sort -u | wc -l > 458804 > > Which is obviously wrong. Here is the real number: > > $ curl -s > http://deb.debian.org/debian/dists/unstable/main/source/Sources.gz | > zgrep Homepage: | sort -u | wc -l > 26250
Just a note before you head toward implementing that: the Homepage field is similar to Section, in the way that it can also be specified in the binary paragraphs, not just the source paragraphs. You can see that as the Homepage field is present in the DEBIAN binary control field of the .debs, and clearly that value might be different than the one in Homepage of the .dsc. So please, look harder for Homepage, not just in the first paragraph of d/control ;) -- regards, Mattia Rizzolo GPG Key: 66AE 2B4A FCCF 3F52 DA18 4D18 4B04 3FCD B944 4540 .''`. More about me: https://mapreri.org : :' : Launchpad user: https://launchpad.net/~mapreri `. `'` Debian QA page: https://qa.debian.org/developer.php?login=mattia `-
signature.asc
Description: PGP signature