RE: Script to extract text from PDF files

2015-11-08 Thread Dan Strohl
extract text from PDF files On Tuesday, September 25, 2007 at 1:41:56 PM UTC-4, brad wrote: > I have a very crude Python script that extracts text from some (and I > emphasize some) PDF documents. On many PDF docs, I cannot extract > text, but this is because I'm doing something

Re: Script to extract text from PDF files

2015-11-06 Thread Scott Werner
On Tuesday, September 25, 2007 at 1:41:56 PM UTC-4, brad wrote: > I have a very crude Python script that extracts text from some (and I > emphasize some) PDF documents. On many PDF docs, I cannot extract text, > but this is because I'm doing something wrong. The PDF spec is large and > complex a

Re: Script to extract text from PDF files

2015-11-05 Thread zbin1986
you can try this free online pdf text extractor http://www.online-code.net/pdf-to-word.html to extract text from pdf free online. -- https://mail.python.org/mailman/listinfo/python-list

Re: Script to extract text from PDF files

2007-09-27 Thread Svenn Are Bjerkem
On Sep 26, 11:50 pm, [EMAIL PROTECTED] wrote: > On Sep 26, 4:49 pm, Svenn Are Bjerkem <[EMAIL PROTECTED]> > wrote: > > > I have downloaded this package and installed it and found that the > > text-extraction is more or less useless. Looking into the code and > > comparing with the PDF spec show a v

Re: Script to extract text from PDF files

2007-09-26 Thread David Boddie
On Wed Sep 26 23:50:16 CEST 2007, byte8bits wrote: > On Sep 26, 4:49 pm, Svenn Are Bjerkem > wrote: > > > I have downloaded this package and installed it and found that the > > text-extraction is more or less useless. Looking into the code and > > comparing with the PDF spec show a very early im

Re: Script to extract text from PDF files

2007-09-26 Thread byte8bits
On Sep 26, 4:49 pm, Svenn Are Bjerkem <[EMAIL PROTECTED]> wrote: > I have downloaded this package and installed it and found that the > text-extraction is more or less useless. Looking into the code and > comparing with the PDF spec show a very early implementation of text > extraction. Luckily it

Re: Script to extract text from PDF files

2007-09-26 Thread Svenn Are Bjerkem
On Sep 25, 9:18 pm, [EMAIL PROTECTED] wrote: > On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote: > > > Googling for 'pdf to text python' and following the first link > > giveshttp://pybrary.net/pyPdf/ > > Doesn't work that well, I've tried it, you should too... the author > even admits th

Re: Script to extract text from PDF files

2007-09-26 Thread brad
David Boddie wrote: > There's a little information on that online: > http://www.glyphandcog.com/textext.html Thanks, I'll read that. > Just because inserting and encoding is well documented doesn't mean that the > reverse processes are easy. :-/ Boy, that's an understatement... most of the PDF

Script to extract text from PDF files

2007-09-26 Thread David Boddie
On Wed Sep 26 15:06:54 CEST 2007, byte8bits wrote: > On Sep 25, 10:19 pm, Lawrence D'Oliveiro central.gen.new_zealand> wrote: > > > This is inherent in the nature of PDF: it's a page-description language, > > not a document-interchange language. Each text-drawing command can put a > > block of t

Re: Script to extract text from PDF files

2007-09-26 Thread byte8bits
On Sep 25, 10:19 pm, Lawrence D'Oliveiro <[EMAIL PROTECTED] central.gen.new_zealand> wrote: > > Doesn't work that well... > > This is inherent in the nature of PDF: it's a page-description language, not > a document-interchange language. Each text-drawing command can put a block > of text anywhere

Re: Script to extract text from PDF files

2007-09-25 Thread Lawrence D'Oliveiro
In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote: > >> Googling for 'pdf to text python' and following the first link >> giveshttp://pybrary.net/pyPdf/ > > Doesn't work that well... This is inherent in the nature of PDF: it's a

Re: Script to extract text from PDF files

2007-09-25 Thread byte8bits
On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote: > Googling for 'pdf to text python' and following the first link > giveshttp://pybrary.net/pyPdf/ Doesn't work that well, I've tried it, you should too... the author even admits this: extractText() [#] Locate all text drawing comman

Re: Script to extract text from PDF files

2007-09-25 Thread Paul Hankin
On Sep 25, 6:41 pm, brad <[EMAIL PROTECTED]> wrote: > I have a very crude Python script that extracts text from some (and I > emphasize some) PDF documents. On many PDF docs, I cannot extract text, > but this is because I'm doing something wrong. The PDF spec is large and > complex and there are va

Script to extract text from PDF files

2007-09-25 Thread brad
I have a very crude Python script that extracts text from some (and I emphasize some) PDF documents. On many PDF docs, I cannot extract text, but this is because I'm doing something wrong. The PDF spec is large and complex and there are various ways in which to store and encode text. I wanted t