Hi Ansgar, On Thu, May 21, 2015 at 11:59:39AM +0200, Ansgar Burchardt wrote: > > I don't know exactly why this changed (maybe different default in > Apache?),
Probably - I just wanted to know if this is an *intentional* change or by accident and will be reverted. > but scraping web pages seems a suboptimal way to gather > information. > > There is [1] with machine-readable information about packages in NEW. I agree to this but there is no sufficient information in a machine readable (if you do not consider html as machine readable) format. When I wrote the machine readable gatherer it was discussed to create single <package>-<version>.822 files but this was never the case. (On the contrary the gather has a never used feature to export those single .822 files.) The patch below is able to cope with the new situation but before I activate it it would be nice to have some confirmation that the latest change will be permanent. Hmmm, may be I commit it anyway since it serves basically the same purpose and is safe against similar changes in future. Kind regards Andreas. > [1] <https://ftp-master.debian.org/new.822> $ git diff diff --git a/scripts/fetch_ftpnew.sh b/scripts/fetch_ftpnew.sh index 3acb421..fd26f1f 100755 --- a/scripts/fetch_ftpnew.sh +++ b/scripts/fetch_ftpnew.sh @@ -4,9 +4,6 @@ mkdir -p $TARGETDIR rm -rf $TARGETDIR/* wget -q http://ftp-master.debian.org/new.822 -O ${TARGETDIR}/new.822 cd $TARGETDIR -wget -q -r -N --level=2 --no-parent --no-directories http://ftp-master.debian.org/new/ -# Some large packages do contain e huge list of files which just consumes space in our -# cache - so simply delete these entries which are of no use here -# sed -i '/^[-dlrwx]\+ root\/root/d' ${TARGETDIR}/*.html -# Finally it might be better to keep originals ... -rm -f $TARGETDIR/index.html* +for newhtml in `wget -q -O- http://ftp-master.debian.org/new.html | grep '^<a href="new/.*\.html' | sed 's?^<a href="\(new/.*\.html\).*?http://ftp-master.debian.org/\1?'` ; do + wget -q $newhtml +done -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-mentors-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150521102656.gg29...@an3as.eu