On Wednesday, November 17, 2010 01:42:40 am James Stroud wrote:
> 
> I did a 5 minute search for an example, and the best I could do with the 
> patience I had was this:
> 
> http://onlinelibrary.wiley.com/doi/10.1002/pmic.200700038/suppinfo
> 
> You'll see in the available PDF file Tables S1-S3. Were I to look for any 
> significant amount of time, I could find much more egregious examples.
> 
> For this particular example, your eyes may deceive you into thinking that the 
> PDF file can be parsed and the data represented in the tables extracted with 
> a script of some sort. But, if you have the patience, go to Table S3 and 
> start selecting text at "Accession Number" in the heading. You'll find that 
> the selection goes down that column only about half way and then begins 
> selecting at the next column, "Swissprot Identifier".
> 
> So basically, the data represented in these tables is useless for any 
> computational analysis by the end user except for (1) those who wish to type 
> the data in by hand or (2) individuals like Dr. Merritt who can presumably 
> just read the data and do the analysis in cranio.


merritt [36] which in_cranio
             in_cranio:       aliased to pdftotext -layout

merritt [37] in_cranio pro200700038_s.pdf



The result is a set of nicely formatted ascii tables with column headings
maintained correctly.  OK, the sequence alignment is mangled, but 
that's not the "data" part.

        cheers,

                Ethan "40 years of augmenting brain cycles with code" Merritt


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Reply via email to