On Mon, 13 Sept 2021 at 23:56, Dave Fisher <wave4d...@comcast.net> wrote: > > Once podlings report their download page something like this can be > incorporated in clutch3 which will have svn log info from dist.
Note that there are quite a few other checks that are needed for a compliant download page, e.g. - no references to nightly or snapshot builds - no reference to repository.apache.org - all releases have sigs and hashes (and vice-versa) - KEYS file link is present and correct - code verification instructions are present - no md5 or sha1 hashes - ... The ruby script at https://github.com/apache/whimsy/blob/master/tools/download_check.rb does this, as well as checking that links actually work, i.e. that mirrors have the files. [Sorry, but it's not very well structured at present...] Of course neither script is likely to work if the page uses JavaScript. > Thanks > > Sent from my iPhone > > > On Sep 13, 2021, at 3:46 PM, Justin Mclean <jus...@classsoftware.com> wrote: > > > > Hi, > > > > A while back I wrote a script to check podling download links and we > > attempted to get them all corrected. you need to manual list all of the > > download pages. > > > > Things it doesn’t do: > > - check if the latest release is there > > - check if the contents match with what is in /dist > > > > Might be time to run it again. > > > > Here’s the python code, you might find it useful. > > > > from bs4 import BeautifulSoup > > import urllib.request > > import re > > > > downloadPages = [ > > "https://mxnet.apache.org/get_started/download" > > ] > > > > for page in downloadPages: > > response = urllib.request.urlopen(page) > > data = response.read() > > soup = BeautifulSoup(data,'lxml') > > > > print() > > print("Checking " + page) > > > > alllinks = soup('a') > > missing = True > > for link in alllinks: > > if link.has_attr('href'): > > href = link['href'] > > text = link.contents > > if href.endswith('.zip') or href.endswith('.tar.gz') or > > href.endswith('.tzg') or href.endswith('.msi') or href.endswith('.rpm'): > > if href.startswith('http://www.apache.org/dist/') or > > href.startswith('https://www.apache.org/dist/'): > > print("Please change link to" + href + " to not use > > http://www.apache.org/dist/ and use https://www.apache.org/dyn/closer.lua > > instead") > > if href.startswith('http://downloads.apache.org/') or > > href.startswith('https://downloads.apache.org/'): > > print("Please change link to" + href + " to not use > > http://downloads.apache.org/ and use https://www.apache.org/dyn/closer.lua > > instead") > > if href.startswith('http://dist.apache.org/repos/dist/dev') > > or href.startswith('https://dist.apache.org/repos/dist/dev'): > > print("Please change link to " + href + " to release > > area and use https://www.apache.org/dyn/closer.lua") > > if > > href.startswith('http://dist.apache.org/repos/dist/release') or > > href.startswith('https://dist.apache.org/repos/dist/release'): > > print("Please use use > > https://www.apache.org/dyn/closer.lua to download releases") > > if > > href.startswith('https://downloads.apache.org/incubator/'): > > print("Please use use > > https://www.apache.org/dyn/closer.lua to download releases") > > if href.endswith('.sha512') or href.endswith('.sha256') or > > href.endswith('.asc'): > > missing = False > > if href.startswith('http://www.apache.org/dist/') or > > href.startswith('https://www.apache.org/dist/'): > > print("Please change link to " + href + " to go via > > https://downloads.apache.org/. https://www.apache.org/dist/ has been > > deprecated.") > > if not href.startswith('https://downloads.apache.org/') and > > not href.startswith('https://archive.apache.org/dist'): > > print("Please change link to " + href + " to go via > > https://downloads.apache.org/ or https://archive.apache.org/dist") > > if href.endswith('.sha'): > > print("for link " + href + " .sha should no longer be used. > > Please change ot use .sha256 or .sha512.") > > if missing: > > print("Links to signatures and hashes are missing”) > > > > Kind Regards, > > Justin > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org