Hi,

On Fri, 3 May 2013, Bruce Dubbs wrote:

> I'm going to write a program to automatically identify out of date
> packages for LFS.  Has anyone already done such a beast?
I'm kind of doing that for a couple of years now (including some BLFS and 
even Windows stuff as well ;-]). I started with a bunch of bash scripts 
that basically parsed certain maintainer websites with certain regexps. 
This was quite hard to read, neither fast nor flexible and always out of 
date.

Current solution (that I'm happy with for quite some years):
All parsing stuff is done by a simple single C(++?) program now. 
It basically follows _all_ links and handles general stuff like stripping 
common extensions (*.tgz etc) or an appended "/download" and replacing 
"/from/a/mirror" by "/from/this/mirror".

As basic input it gets a list of simple rules to look for:
$packagename $starturl $pattern, e.g.:
mpc http://www.multiprecision.org/?prog=mpc&page=download tar.gz
check http://sourceforge.net/projects/check/files/check/ /tar.gz/download

$pattern in most cases only specifies the (sub/parent)directory depth to 
search in (number of leading slashes) and the extension (or better: end) 
of the links to look for there. It usually does not filter for any kind of 
naming or versioning scheme. As a result I get a list of 
directories/websites searched in and a list of URLs to potentially 
download.

This would include following uninteresting links (such as parent dirs or 
adverts or subdirs of outdated versions or subdirs of packages I'm not 
interested in). Therefore I keep a list of fully qualified 
directories/websites not to be searched by above C program again, e.g:
ftp://ftp.funet.fi:21/pub/mirrors/ftp.easysw.com/pub/cups/1.1.19/
ftp://ftp.funet.fi:21/pub/mirrors/ftp.easysw.com/pub/cups/1.1.20/
http://apache.osuosl.org/
http://creativecommons.org/licenses/by-sa/3.0/
hhttp://jobs.sourceforge.net/

This would give me a list of package URLs, but include stuff that I'm not 
intersted in (which just happens to come from the same directory/site) or 
stuff that I already have. Therefore I keep a list of such done packages 
with certain extensions stripped (to avoid getting an tar.gz as tar.xz 
again), e.g.:
autoconf-2.52
autoconf-2.53
autoconf-2.54
linux-2.6.16.18-utf8_input-1.patch
linux-2.6.16.19
linux-2.6.16.19-utf8_input-1.patch

The C program has those 3 lists (currently 24KB commented rules, 120KB 
dirs done, 230KB packages done) in memory and can therefore filter results 
rapidly.

[You can add further sanity checks like remembering when a certain rule 
resulted in package URLs at all or in new package URLs for the last time 
to hint at taking a look whether the maintainer changed website, extension 
or subdir structure.]

So I automatically get a list of subdirs currently searched (and may 
exclude older versions or new unintersting packages or new advert from 
further search) and I automatically get a list of new package URLs that I 
may either want to download or just mark as done (for skipping missed 
intermediate versions or by-catch of packages I'm not interested in).

Example: current list of new package URLs that I might potentially be 
interested in downloading:
http://ftp.gnome.org/pub/gnome/sources/gtk+/3.9/gtk+-3.9.0.tar.xz
http://icedtea.wildebeest.org/download/source/icedtea-2.1.8.tar.gz
http://sourceforge.net/projects/libpng/files/libpng15/1.5.16beta02/libpng-1.5.16beta04.tar.xz/download
http://www.linuxfromscratch.org/blfs/downloads/svn/blfs-book-svn-html-2013-05-03.tar.bz2
http://www.linuxfromscratch.org/lfs/downloads/development/LFS-BOOK-SVN-20130501.tar.bz2

Surely not perfect, but easy to maintain and does the job for me...

Uwe
-- 
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Reply via email to