Robert Kern wrote: > If I may digress for a bit, my advisor is currently working on a project > that is processing seafloor depth datasets starting from a few decades > ago. A lot of this data was orginally to be processed using FORTRAN > software, so in the idiom of much FORTRAN software from those days, 9999 > is often used to mark missing data. Unfortunately, 9999 is a perfectly > valid datum in most of the unit systems used by the various datasets. > > Now he has to find a grad student to traul through the datasets and > clean up the really invalid 9999's (as well as other such fun tasks like > deciding if a dataset that says it's using feet is actually using meters).
I'm afraid this didn't end with FORTRAN. It's not that long ago that I wrote a program for my wife that combined a data editor with a graph display, so that she could clean up time lines with length and weight data for children (from an international research project performed during the 90's). 99cm is not unreasonable as a length, but if you see it in a graph with other length measurements, it's easy to spot most of the false ones, just as mistyped year part in a date (common in the beginning of a new year). Perhaps graphics can help this grad student too? It's certainly much easier to spot deviations in curves than in an endless line of numbers if the curves would normally be reasonably smooth. -- http://mail.python.org/mailman/listinfo/python-list