dueff...@uwe-dueffert.de wrote: > Hi, > > On Fri, 3 May 2013, Bruce Dubbs wrote: > >> I'm going to write a program to automatically identify out of date >> packages for LFS. Has anyone already done such a beast?
> I'm kind of doing that for a couple of years now (including some BLFS and > even Windows stuff as well ;-]). I started with a bunch of bash scripts > that basically parsed certain maintainer websites with certain regexps. > This was quite hard to read, neither fast nor flexible and always out of > date. > > Current solution (that I'm happy with for quite some years): > All parsing stuff is done by a simple single C(++?) program now. > It basically follows _all_ links and handles general stuff like stripping > common extensions (*.tgz etc) or an appended "/download" and replacing > "/from/a/mirror" by "/from/this/mirror". I'm using php. It is generally easier to maintain than C/C++. php can do anything C can and the computation time is not an issue for this type of application. The most time used will be fetching directory listings from remote sites. > As basic input it gets a list of simple rules to look for: > $packagename $starturl $pattern, e.g.: > mpc http://www.multiprecision.org/?prog=mpc&page=download tar.gz > check http://sourceforge.net/projects/check/files/check/ /tar.gz/download > > $pattern in most cases only specifies the (sub/parent)directory depth to > search in (number of leading slashes) and the extension (or better: end) > of the links to look for there. It usually does not filter for any kind of > naming or versioning scheme. As a result I get a list of > directories/websites searched in and a list of URLs to potentially > download. > > This would include following uninteresting links (such as parent dirs or > adverts or subdirs of outdated versions or subdirs of packages I'm not > interested in). Therefore I keep a list of fully qualified > directories/websites not to be searched by above C program again, e.g: > ftp://ftp.funet.fi:21/pub/mirrors/ftp.easysw.com/pub/cups/1.1.19/ > ftp://ftp.funet.fi:21/pub/mirrors/ftp.easysw.com/pub/cups/1.1.20/ > http://apache.osuosl.org/ > http://creativecommons.org/licenses/by-sa/3.0/ > hhttp://jobs.sourceforge.net/ Yes, I may use a variation of that. > This would give me a list of package URLs, but include stuff that I'm not > intersted in (which just happens to come from the same directory/site) or > stuff that I already have. Therefore I keep a list of such done packages > with certain extensions stripped (to avoid getting an tar.gz as tar.xz > again), e.g.: > autoconf-2.52 > autoconf-2.53 > autoconf-2.54 > linux- > linux- > linux- Actually, I want to know if a xz version exists. My order of preference is xz, bz2, gz. All the packages in LFS are one of those. I haven't looked at BLFS yet. > The C program has those 3 lists (currently 24KB commented rules, 120KB > dirs done, 230KB packages done) in memory and can therefore filter results > rapidly. I agree that the memory requirement is not particularly large and few items, if any, beyond the final results need to be written out. > [You can add further sanity checks like remembering when a certain rule > resulted in package URLs at all or in new package URLs for the last time > to hint at taking a look whether the maintainer changed website, extension > or subdir structure.] > > So I automatically get a list of subdirs currently searched (and may > exclude older versions or new unintersting packages or new advert from > further search) and I automatically get a list of new package URLs that I > may either want to download or just mark as done (for skipping missed > intermediate versions or by-catch of packages I'm not interested in). > > Example: current list of new package URLs that I might potentially be > interested in downloading: > http://ftp.gnome.org/pub/gnome/sources/gtk+/3.9/gtk+-3.9.0.tar.xz > http://icedtea.wildebeest.org/download/source/icedtea-2.1.8.tar.gz > http://sourceforge.net/projects/libpng/files/libpng15/1.5.16beta02/libpng-1.5.16beta04.tar.xz/download > http://www.linuxfromscratch.org/blfs/downloads/svn/blfs-book-svn-html-2013-05-03.tar.bz2 > http://www.linuxfromscratch.org/lfs/downloads/development/LFS-BOOK-SVN-20130501.tar.bz2 This does give me a couple of ideas to play with. Thanks. -- Bruce -- http://linuxfromscratch.org/mailman/listinfo/lfs-dev FAQ: http://www.linuxfromscratch.org/faq/ Unsubscribe: See the above information page