Hi Aaditya, Actually reading hindi text is not as simple as reading english text. Most of the Hindi PDFs don't have standard encoding.
And Encoding is value given to each Unicode code point. And each encoding corresponds to font representation. So a PDF takes the encoding, maps it to a font using a Font map and then renders the font. It does not know what character it is. So For reading most of hindi PDFs, we have to know the encoding to character mapping. I worked in my previous company with Dainik Bhaskar, and other hindi newspaper PDFs and faced the same problem. So a generic hindi PDF to text is not possible. But if u know a specific encoding, then u u might be able to write a specific Hindi PDF to text. Amal. On Wed, Jun 2, 2010 at 2:50 AM, Srinivas Reddy Thatiparthy < srinivas_thatipar...@akebonosoft.com> wrote: > Hindhi is a unicode text , your input data should be treated as Unicode > instead of > ASCII and last but not the least the encoding format in editor should be > set to unicode ,otherwise you see garbled text. > > > This is my guess , i have never worked with unicode in python.If i am wrong > please correct me. > > Thanks&Regards, > Srinivas Reddy Thatiparthy, > Mobile:9393099772, > > > > -----Original Message----- > From: bangpypers-bounces+srinivas_thatiparthy=akebonosoft....@python.orgon > behalf of AADITYA SRIRAM > Sent: Wed 6/2/2010 2:22 PM > To: bangpypers@python.org > Subject: [BangPypers] PyPDF to read hindi > > Hi guys, i am writing a small program to convert pdf to text files(i know > its easy and lame but need to start somewhere !!), anyway i am not bale to > rip the hindi text in readable form :( can anyone please help ? Its working > fine with english text . > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > > > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > > _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers