On Wednesday 06 June 2007 13:00:19 Andreas Tille wrote: > On Wed, 6 Jun 2007, Tim Cutts wrote: > 0. Find a solution for large data sets in generel > 1. Find a solution for static biological data (I couldn't believe > that all biological data are really changing that frequently). > 2. Find a solution that might make the kind of handling of > dynamical data as you described more user firendly (bittorrent).
Not all data is updated at the bimonthly Ensembl-pace or as big as Ensembl. But the most interesting data is :o) ... > > software which builds and then presents http://www.ensembl.org) > > 4) Maintaining our own package repository > > 5) Migration from Tru64 to Debian ... > > > > Feel free to suggest to me things that you'd find interesting to talk > > I personally would be mostly interested in top 4 (Maintaining our own > package repository). It would be lovely if we could agree on a set of databases to support in Debian and to have a permanent location in the file system for them. For the reasons that Tim has already outlined I do not see to distribute the larger database as Debian packages. Once a (computational) biologist starts a new project, (s)he wants the latest data no matter what and anything older than three months (or a week sometimes) is likely not to be acceptable. I do not see any packaging effort to work for that and particularly not in the way we think of the stable distribution. What may be stable though is an application that install the latest databases for the user. And maybe that application would even know how to make use of the diffs to the respective latest release that many databases like EMBL offer in order to reduce download times (we are talking about many Gigs for these big players). I could well imagine, that an application that maintains the most important databases of say the Nucleic Acids Research's January issue could well be publishable and may be a nice project for a summer student to start off. Any volunteers on this list by any chance? I am not certain about how to reference a such auto-maintained particular database from other packages. Maybe there could be something like virtual packages that depend on the auto-biodb-maintaintenance tool and call it in their postinst scripts as $ auto-biodb-maintaintenance --make-sure-it-is-maintained dbname Many greetings Steffen
signature.asc
Description: This is a digitally signed message part.