Hi Laura! I've written the attached patch (untested).
I'm placing my changes under the following Licence: https://github.com/shlomif/shlomif-computer-settings/blob/master/shlomif-settings/git/commit-messages/cc0-copyright-disclaimer.txt Note that the program is still python 2.x which was EOLed. It should be updated to python 3.x. On Fri, 4 Sep 2020 21:41:40 +0200 Laura Arjona Reina <larj...@debian.org> wrote: > Package: www.debian.org > User: www.debian....@packages.debian.org > Usertag: scripts > Severity: normal > > Hi > > the scripts "urlcheck" generate this log in the /logos folder: > > Looking into http://www.debian.org/logos/openlogo.xcf.gz > Error reading page: http://www.debian.org/logos/openlogo.xcf.gz > Looking into http://www.debian.org/logos/officiallogo.xcf.gz > Error reading page: http://www.debian.org/logos/officiallogo.xcf.gz > Looking into http://www.debian.org/logos/officiallogo-nd.xcf.gz > Error reading page: http://www.debian.org/logos/officiallogo-nd.xcf.gz > > I guess this means it tries to parse the xcf.gz files and probably we > need to update the script to skip such files (compressed images). > > Anybody familiarised with Python, who can help? > > The code of the script is here: > > https://salsa.debian.org/webmaster-team/cron/-/tree/master/urlcheck > > (I guess the main script, urlcheck.py, is where maybe the fix should be > made). > > The script is called by 3 cron jobs: > > 17 3 * * * cd /srv/www.debian.org/cron/urlcheck && ./run.urlcheck > 36 12 * * * cd /srv/www.debian.org/cron/urlcheck && > ./make.bad_link.pages > 5 13 * * * cd /srv/www.debian.org/cron/urlcheck && ./cleanup.logs > > and the daily logs are here: > https://www-master.debian.org/build-logs/urlcheck/ > (check logos folder). > > Kind regards -- Shlomi Fish https://www.shlomifish.org/ UNIX Fortune Cookies - https://www.shlomifish.org/humour/fortunes/ The cake was not a lie for Chuck Norris. — https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/ Please reply to list if it's a mailing list post - https://shlom.in/reply .
diff --git a/urlcheck/run.urlcheck b/urlcheck/run.urlcheck index b532eac..4f5749b 100755 --- a/urlcheck/run.urlcheck +++ b/urlcheck/run.urlcheck @@ -7,6 +7,7 @@ date=`date +%Y%m%d` --ignore debian.org/fom --ignore /releases/ --ignore /international/ --ignore /security/ \ --ignore /devel/ --ignore /News/ --ignore /doc/ --ignore /distrib/ \ --ignore /ports/ --ignore /intl/ \ + --ignore '\.xcf\.(?:bz2|gz|xz)$' \ http://www.debian.org/ >& logs/web.$date & ./urlcheck.py --require www.debian.org/international http://www.debian.org/international/ \ >& logs/web.$date.intl & diff --git a/urlcheck/urlcheck.py b/urlcheck/urlcheck.py index a5c3909..e60aa78 100755 --- a/urlcheck/urlcheck.py +++ b/urlcheck/urlcheck.py @@ -229,6 +229,7 @@ def append_from(path, list): print "Can't open " + path sys.exit(1) +ignore.append('\\.xcf\\.(?:bz2|gz|xz)$') options, args = getopt.getopt(sys.argv[1:], "", ["require=", "ignore=", "requirefrom=", "ignorefrom=", "non-compliant", "non-compliant-from="]) for option in options: if option[0] == '--require':