Re: Gnumed for arm

2008-04-06 Thread Sebastian Hilbert
On Montag 31 März 2008, Andreas Tille wrote:
> On Mon, 31 Mar 2008, Sebastian Hilbert wrote:
> > This is what I mean
> >
> > http://qa.debian.org/debcheck.php?dist=testing&package=gnumed-client
>
> Ahh, OK, that explains the situation.
>
> > It is not available for arm architecture , is it ?
> >
> > Take file for example from here http://packages.debian.org/lenny/file
> > It shows more architectures including arm.
> >
> > So I believe there is no repository I could apt-get from, right ?
>
> No this conclusion is wrong because you are able to install a package
> without recommends:  Have a look at bug #473089
>
>  http://bugs.debian.org/473089
>
> and the Ãpatch I provided to this bug report:
>
>  apt-get -o APT::Install-Recommends=false install gnumed-client
>
I added the following repository to my N810 sources.list

http://armel-debs.applieddata.net/debian/dists/testing/

Then I issued apt-get install gnumed-client.

This led to: ument dependencies gnumed-common.

So I issued apt-get install gnumed-common.

Which led to : PreDepends: adduser but is not going to be installed

So I issued apt-get install adduser

Which led to : Depends : passwd (>= 1:4.0.12) but 1:4.0.3-31sarge5.osso7 is to 
be installed

Anything that can be done from your side ? Thanks.


-- 
Sebastian Hilbert 
Leipzig / Germany
[www.gnumed.de]  -> PGP welcome, HTML ->/dev/null



Re: Google SoC (Bio DB manager)

2008-04-06 Thread Aidan Findlater
Hi guys,

I was hoping those that are interested might offer some constructive
criticism on my application:

Abstract:

Bioinformatics research requires the processing of large amounts of
biological data. Because of the sheer quantity of data analysed, most
researchers must run local mirrors of the databases that they use.
Unfortunately, local mirrors can be intimidating to set up and tedious to
maintain. Researchers may choose to use older versions of the datasets
involved out of laziness or fear of breaking their current scripts, or they
may choose to forego large-scale analyses altogether, especially if they
have less experience with systems administration.

I propose to solve this problem by creating a tool that will automate the
process of finding, installing, updating, and indexing mirrors of biological
databases. It will resolve dependencies, such as datasets that are mapped to
other datasets and programs that are required for indexing. The tool should
allow users to maintain multiple versions of the databases, as some analyses
may be linked to specific revisions of the data. As well, it should automate
migration of the datasets from one directory or volume to another, for cases
where hard disk space is limited.

Ideally, biological database mirroring will be made easy enough that it can
be used by anyone familiar with Debian's existing tools. Not only will
current researchers be will be more likely to use the most up-to-date
biological data, but others who were previously deterred by the inherent
difficulties of maintaining such mirrors may be encouraged to pursue
large-scale data analyses.

Debian is one of the most popular and stable GNU/Linux distributions, and
already provides the base for popular bioinformatics-targeted distributions
such as Debian-Med, DNALinux, and Bio-Linux. Debian currently leads in both
the quality and quantity of bioformatics packages. It represents the ideal
platform on which to build such a tool. Conversely, such a tool would also
help to solidify Debian as the standard bioinformatics platform.

Theoretically, the application is not limited to biological databases. It
would be readily expanded to any situation that requires local mirrors of
large data sets, such as those used in astronomy. Other future development
might also add a GUI to make it more user-friendly.

Detailed Description:

Introduction

Advances in the automation of biological experimentation and data collection
have led to an explosion in the size and number of biological databases.
Although data clearinghouses such as GenBank, EMBL, and DDBJ facilitate the
dissemination of such data, any large-scale bioinformatics analysis requires
local mirrors of the relevant databases. The extreme size and volatility of
the data sets involved have prevented them from being integrated into the
standard Debian package management system. Manually finding, installing,
updating, and indexing such databases is a daunting task for any system
administrator, much less a researcher with limited time and computer
training.


Proposed Project

The project is the creation of a tool to automate the life cycle of
biological databases, from installation to removal. It should be usable by
those with limited technical experience. Its various proposed uses are as
follows:

Select:
Database selection from a list
Version selection, if appropriate
Dependency checking for other databases and/or database versions
Dependency checking for installed programs (especially important for the
"processing" step below)
Install:
Download
Extract
Process: load into MySQL, index for BLAST, etc.
Clean up: remove any remaining downloaded files
Update:
Check for new versions of installed datasets
Install updated sets without removing old versions
Remove:
Remove data that resulted from processing: drop MySQL tables, delete
indices, etc.
Remove extracted files
Reinstall:
Remove and install again
Relocate:
Either a simple "mv" or a reinstall into a new location


Other considerations: Because analyses may be linked to specific version,
each version will have its own separate installation, e.g. both ensembl.v38
and ensembl.v39. As well, each database will have very different
post-extraction processing, with some being indexed for BLAST, some being
loaded into a local SQL database, and others having nothing done at all.
This problem is compounded by the lack of common data storage formats. A
significant amount of hand-coding may be required for each of the different
databases' installation step.

Timeline

May: Community bonding period
June: Basic download/version functionality with dependency database
July: Installation functionality for select datasets
August: Updating and relocation functionality
Add as many other datasets as possible


Personal Background

My name is Aidan Findlater ([EMAIL PROTECTED]). I will be graduating
this May with two degrees, a BSc in Computing and a BSc (Honours) in
Bioc