Anthony Towns <[EMAIL PROTECTED]> writes: > On Tue, Jun 05, 2007 at 06:28:53PM +0900, Charles Plessy wrote: >> Le Tue, Jun 05, 2007 at 10:09:07AM +0200, Michael Hanke a ?crit : >> > My question is now: Is it reasonable to provide this rather huge amount >> > of data in a package in the archive? >> many thanks for bringing this crucial question on -devel. In my field, I >> wish that it would be possible to apt-get install the human genome for >> instance. > > Are either of you going to debconf, or able to point out some example > large (free?) data sets that should be packaged like this as a test case > for playing with over debconf?
The NCBI non-redundant database (nr). Having this packaged and frequently updated (maybe in volatile) would be fantastic. There are also quite a number of other significant (popular) databases used for bioinformatics, genomics, proteomics and other biological fields which would be really nice to have in Debian. Here's a selection: ftp://ftp.ncbi.nih.gov/blast/db/ ftp://ftp.ncbi.nih.gov/refseq/ ftp://ftp.ncbi.nih.gov/repository/ ftp://ftp.ncbi.nih.gov/pub/taxonomy/ Because these are all in standard formats, it might even be possible to have updated packages generated and uploaded semi-automatically. These would be really useful in conjunction with much of the bioinformatics software already available in Debian, which could make good use of them if they were put in standardised locations. As has been mentioned previously, a separate archive section so that mirrors could skip them would be nice. Together, all these databases are eye-wateringly huge. Especially when uncompressed. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
pgpRo5RfEC3v7.pgp
Description: PGP signature