You may want to try out pdfminer. Its very similar to xpdf in structure and
should give you the parsed data into unicode directly.
On Mon, May 24, 2010 at 7:13 PM, Eknath Venkataramani wrote:
> I have around 45 pdfs to convert into raw text containing text in _HINDI_ .
> When I use the xpdf pack
I have around 45 pdfs to convert into raw text containing text in _HINDI_ .
When I use the xpdf package, the generated text is very weird, so I'd like
to write a program which would convert the pdf text into Unicode text as it
is.
The fonts used in the pdfs:
name