Re: [ccp4bb] pdf to text

Oganesyan, Vaheh Mon, 13 Sep 2010 09:40:11 -0700

Thanks go to all who took their time and answered and in some cases did
file manipulations. Extremely helpful! Not only for this particular case
but many times CCP4BB has proven to be the best.
Below are received answers in chronological order:


Albert Gluskov: I converted it in adobe acrobat pro, so now you can
manipulate it, but check all numbers, because usually software doesn't
recognize everything 100% correctly. (the file was attached)
_______________________________________________________________
Tiago Botelho: Why don't you try an OCR program to read the text for
you? It should do the trick...

Let me know if it worked!       
_______________________________________________________________

Jonas Boehringer: I have run OCR text recognition in Adobe Acrobat over
it and attached the .pdb file. You will still have to remove title, page
numbers and some errors in the last column but better than typing it all
by hand.
_______________________________________________________________

Judit Murray-Rust: Can't convert your pdf to text (tho note that I have
typed up bgger mols than this hand in the past - it's all a question of
how badly you want it!). But here is one way to get set a set of
coordinates reputedly of polymixin B

1. look up in chemspider (my result =
http://www.chemspider.com/Chemical-Structure.8009375.html) - note there
are other polymixin B deriv here, so you probably want to start at the
front page of chemspider, that link is just so rest of explantion makes
sense.
2. select 3D under the black box on the lhs. you should get a jmol image
(rotatable) 
3. use the save button to save it as a mol file.
4. import the mol file into some displa program (I used pymol) and save
the result in pdb format.
5. It is then up to you to check it is the molecule you want!


can get 3D mol file from chemspider, too. And convert to pdb with one of
many display progs. But whether it is exactly the same as what is in the
pdf file is an exercise for the user ;-) J

_______________________________________________________________

Paul Mcewan: Try using the prodrug server..... if you can draw it, it
can make it!
 
http://davapc1.bioch.dundee.ac.uk/prodrg/
_______________________________________________________________

Paul Emsley: If you don't need the same conformation, how about
downloading the 2D mol file from chemspider?  Prodrg can convert that to
3D in a trice.
(screenshot attached)

_______________________________________________________________

Luca Jovin: Just ran your PDF file through an OCR program - output
attached. Obviously it will still needs quite a bit of manual editing,
and I would make super-sure it read numbers correctly, but still it's a
start!
(text file attached)
_______________________________________________________________
Tim Gruene: You could try a text recognition software. In the
Linux-/Unix-world, gocr is probably to most popular one. Since it is a
courier font, your chances should be pretty good that it works.
_______________________________________________________________

Fred Vellieux: What you could try to do is print out the pdf file, then
locate a scanner with a suitable scanning software. Several scanning
software have the possibility of generating word processing program
output or ASCII format. Since the pdf file is text only (no figures etc)
then it should be OK. You just have to go through the output generated
and check for errors (such software is not "perfect" and produces errors
here and there).
_______________________________________________________________
Tomas Malinauskas: PDB file:
http://129.128.185.122/drugbank2/drugs/DB00781/pdb/download
More information:
http://www.drugbank.ca/drugs/DB00781
_______________________________________________________________
Mark Brooks: For OCR without installing software, "Free OCR"
http://www.free-ocr.com/ works quite well for me, but beware that you
may need to do corrections afterwards. 
 
Just upload your file to this web site, as long as it isn't secret!
 
The OCR in Adobe Acrobat works better for me though, and is worth the
money, I think.
_______________________________________________________________

Ed Pozharski: Ran it through Adobe Acrobat OCR text recognition - I
think now it's a selectable text. (proper pdf file attached).


In case anybody needs any of the files sent to me - just let me know.

Regards,


     Vaheh  




To the extent this electronic communication or any of its attachments contain 
information that is not in the public domain, such information is considered by 
MedImmune to be confidential and proprietary.  This communication is expected 
to be read and/or used only by the individual(s) for whom it is intended.  If 
you have received this electronic communication in error, please reply to the 
sender advising of the error in transmission and delete the original message 
and any accompanying documents from your system immediately, without copying, 
reviewing or otherwise using them for any purpose.  Thank you for your 
cooperation.

Re: [ccp4bb] pdf to text

Reply via email to