subject:"\[BangPypers\] extracting unicode text from pdfs"

Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani

On Mon, May 24, 2010 at 7:51 PM, Dhananjay Nene wrote: > You may want to try out pdfminer. Its very similar to xpdf in structure and > should give you the parsed data into unicode directly. > Tried but I got the same output as xpdf. I guess it's because of the point mentioned by Gora- 'you might n

Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani

Tried .. didn't work out well enough. The output is same as what I get out of xpdf On Mon, May 24, 2010 at 7:51 PM, Dhananjay Nene wrote: > You may want to try out pdfminer. Its very similar to xpdf in structure and > should give you the parsed data into unicode directly. > > On Mon, May 24, 2010

Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Gora Mohanty

On Mon, 24 May 2010 19:13:26 +0530 Eknath Venkataramani wrote: > I have around 45 pdfs to convert into raw text containing text in > _HINDI_ . When I use the xpdf package, the generated text is very > weird, so I'd like to write a program which would convert the pdf > text into Unicode text as it

Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Dhananjay Nene

You may want to try out pdfminer. Its very similar to xpdf in structure and should give you the parsed data into unicode directly. On Mon, May 24, 2010 at 7:13 PM, Eknath Venkataramani wrote: > I have around 45 pdfs to convert into raw text containing text in _HINDI_ . > When I use the xpdf pack

[BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani

I have around 45 pdfs to convert into raw text containing text in _HINDI_ . When I use the xpdf package, the generated text is very weird, so I'd like to write a program which would convert the pdf text into Unicode text as it is. The fonts used in the pdfs: name

Re: [BangPypers] extracting unicode text from pdfs

Re: [BangPypers] extracting unicode text from pdfs

Re: [BangPypers] extracting unicode text from pdfs

Re: [BangPypers] extracting unicode text from pdfs

[BangPypers] extracting unicode text from pdfs

5 matches

Site Navigation

Mail list logo

Footer information