extract text from PDF files
On Tuesday, September 25, 2007 at 1:41:56 PM UTC-4, brad wrote:
> I have a very crude Python script that extracts text from some (and I
> emphasize some) PDF documents. On many PDF docs, I cannot extract
> text, but this is because I'm doing something
On Tuesday, September 25, 2007 at 1:41:56 PM UTC-4, brad wrote:
> I have a very crude Python script that extracts text from some (and I
> emphasize some) PDF documents. On many PDF docs, I cannot extract text,
> but this is because I'm doing something wrong. The PDF spec is large and
> complex a
you can try this free online pdf text extractor
http://www.online-code.net/pdf-to-word.html to extract text from pdf free
online.
--
https://mail.python.org/mailman/listinfo/python-list
On Sep 26, 11:50 pm, [EMAIL PROTECTED] wrote:
> On Sep 26, 4:49 pm, Svenn Are Bjerkem <[EMAIL PROTECTED]>
> wrote:
>
> > I have downloaded this package and installed it and found that the
> > text-extraction is more or less useless. Looking into the code and
> > comparing with the PDF spec show a v
On Wed Sep 26 23:50:16 CEST 2007, byte8bits wrote:
> On Sep 26, 4:49 pm, Svenn Are Bjerkem
> wrote:
>
> > I have downloaded this package and installed it and found that the
> > text-extraction is more or less useless. Looking into the code and
> > comparing with the PDF spec show a very early im
On Sep 26, 4:49 pm, Svenn Are Bjerkem <[EMAIL PROTECTED]>
wrote:
> I have downloaded this package and installed it and found that the
> text-extraction is more or less useless. Looking into the code and
> comparing with the PDF spec show a very early implementation of text
> extraction. Luckily it
On Sep 25, 9:18 pm, [EMAIL PROTECTED] wrote:
> On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote:
>
> > Googling for 'pdf to text python' and following the first link
> > giveshttp://pybrary.net/pyPdf/
>
> Doesn't work that well, I've tried it, you should too... the author
> even admits th
David Boddie wrote:
> There's a little information on that online:
> http://www.glyphandcog.com/textext.html
Thanks, I'll read that.
> Just because inserting and encoding is well documented doesn't mean that the
> reverse processes are easy. :-/
Boy, that's an understatement... most of the PDF
On Wed Sep 26 15:06:54 CEST 2007, byte8bits wrote:
> On Sep 25, 10:19 pm, Lawrence D'Oliveiro central.gen.new_zealand> wrote:
>
> > This is inherent in the nature of PDF: it's a page-description language,
> > not a document-interchange language. Each text-drawing command can put a
> > block of t
On Sep 25, 10:19 pm, Lawrence D'Oliveiro <[EMAIL PROTECTED]
central.gen.new_zealand> wrote:
> > Doesn't work that well...
>
> This is inherent in the nature of PDF: it's a page-description language, not
> a document-interchange language. Each text-drawing command can put a block
> of text anywhere
In message <[EMAIL PROTECTED]>,
[EMAIL PROTECTED] wrote:
> On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote:
>
>> Googling for 'pdf to text python' and following the first link
>> giveshttp://pybrary.net/pyPdf/
>
> Doesn't work that well...
This is inherent in the nature of PDF: it's a
On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote:
> Googling for 'pdf to text python' and following the first link
> giveshttp://pybrary.net/pyPdf/
Doesn't work that well, I've tried it, you should too... the author
even admits this:
extractText() [#]
Locate all text drawing comman
On Sep 25, 6:41 pm, brad <[EMAIL PROTECTED]> wrote:
> I have a very crude Python script that extracts text from some (and I
> emphasize some) PDF documents. On many PDF docs, I cannot extract text,
> but this is because I'm doing something wrong. The PDF spec is large and
> complex and there are va
I have a very crude Python script that extracts text from some (and I
emphasize some) PDF documents. On many PDF docs, I cannot extract text,
but this is because I'm doing something wrong. The PDF spec is large and
complex and there are various ways in which to store and encode text. I
wanted t
14 matches
Mail list logo