On Wednesday 18 August 2010 11:25:19 am Andreas Förster wrote: > Thanks to everyone for the good ideas and suggestion. Let me clarify > what I want. A simple system that does one task. I'm with James Holton > on complexity and with several others on wikis and databases. They're > simple to set up and easy to use, but no one does, besides the one who > implemented them. I've seen this with a lab wiki and a plasmid > database. If the boss just approves of the project but doesn't enforce > usage, it won't be used. > > That's why what I really want is an unavoidable system.
Our protocol makes use of a FileMaker database (the one Juergen Bosch mentioned earlier) that tracks all mounted crystals. It is both handy and, as you say you want (but be careful what you wish for), unavoidable. Juergen was largely responsible for setting it up in the first place, but it has remained in continuous use since then. This works for us because the great bulk of our data collection is done using the BluIce interface to the SSRL beamlines. As a requirement for data collection, users must provide a spreadsheet that indexes each crystal and its location in the SSRL sample cassette. We create this spreadsheet directly as an export from our lab database. The database itself assigns a unique systematic directory name for each crystal. The spreadsheet is then used by the beamline software to screen and collect data from all the crystals. The beamline software fills in screening information as it goes, including the cell dimensions, etc, as determined by the automated software. The data images for each crystal are put into a uniquely named directory as specified in the spreadsheet. After the run, the updated spreadsheet is merged back into our lab database and the data images are archived keeping their systematic uniquely determined directory names. Yes, if you work hard at it you can manage to mess up, say, the human-interpretable meaning of the assigned systematic name. But you cannot avoid the system altogether, because the only way to reserve a slot for your crystal in the cassette being sent for data collection is to enter its identifying information in the lab database. There is still room to lose track of archived data at a larger scale. Last I asked, TARDIS and the like cannot really help much with this. If your 600 Gigabytes of archived data from 2008 are indexed as being stored on disk XD_2008_2 in Room K407 of building HSB, it can tell you exactly what directory on that disk corresponds to the data from which crystal. Unfortunately, it doesn't tell you that in fact that disk was moved to a room down the hall 6 months ago when the lab was reorganized :-) The drawbacks of this system are - I wish I knew of an open-source linux-compatible equivalent to FileMaker. Nothing else I have looked at offered this level of easy yet controlled access via a web browser from remote locations. - Compliance with the protocol drops to less than 100% for datasets collected at home rather than at a beamline. - One is still faced with the issue of how to deal with archiving terabytes of data - Ethan > I'm thinking of > an uploader that sits on the file server. Only the uploader has write > permission. The user calls the uploader because data is only backed up > on the file server, puts the data directory name into a box and fills in > a few other boxes (four or five) because otherwise the uploader won't > work. The uploader interface could then be used to query the file > server and find datasets. But the key is that the system MUST be used > to archive data - basically like flickr, but with the tag boxes > mandatory. It's look like TARDIS (http://tardis.edu.au/) might have > such capabilities. > > The discussion regarding LIMS and ISPyB and other fancy tracking systems > was fascinating, but I don't see those as relevant for my archiving > task. For the same reason, xTrack doesn't fit my bill. I want to bury > data, but not so deep that I don't find them should I ever need to. I > don't care about space group or crystallization conditions or processing > information - the CCP4_DATABASE breaks with time anyway, either because > a user renamed directories or because the user's home directory has been > moved to /oldhome to make space for new users. I just want to be able > to always find old data. > > Going off on a tangent, associating a jpg of the first image (with > resolution rings) to each dataset is great. Can the generation of such > images be automated, ie. a script for the whole directory tree? > > All best. > > > Andreas > > > > On 18/08/2010 11:44, Eleanor Dodson wrote: > > I would contact Johan Turkenburg here - he and sSam Hart have organised > > the York data archive brilliantly - it is now pretty straightforward to > > access any data back to ~ 1998 I think.. > > > > Eleanor > > j...@ysbl.york.ac.uk > > > > Andreas Förster wrote: > >> Dear all, > >> > >> going through some previous lab member's data and trying to make sense > >> of it, I was wondering what kind of solutions exist to simply the > >> archiving and retrieval process. > >> > >> In particular, what I have in mind is a web interface that allows a > >> user who has just returned from the synchrotron or the in-house > >> detector to fill in a few boxes (user, name of protein, mutant, light > >> source, quality of data, number of frames, status of project, etc) and > >> then upload his data from the USB stick, portable hard drive or remote > >> storage. > >> > >> The database application would put the data in a safe place (some file > >> server that's periodically backed up) and let users browse through all > >> the collected data of the lab with minimal effort later. > >> > >> I doesn't seem too hard to implement this, which is why I'm asking if > >> anyone has done so already. > >> > >> Thanks. > >> > >> > >> Andreas > >> > > > > > > -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742