Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani
ra- 'you might not have those fonts installed in your system' -- Eknath Venkataramani ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani
t; On Mon, May 24, 2010 at 7:13 PM, Eknath Venkataramani < > eknath.i...@gmail.com > > wrote: > > > I have around 45 pdfs to convert into raw text containing text in _HINDI_ > . > > When I use the xpdf package, the generated text is very weird, so I'd > like >

[BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani
weird symbols: '... ...' while i'd like 'आदमी मुसाफिर है' to be the output -- Eknath Venkataramani ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] Coaching institute in Bangalore.

2010-04-16 Thread Eknath Venkataramani
ning to Fail" > > Blog:kunalghosh.wordpress.com > Website:www.kunalghosh.net46.net > V-card:http://tinyurl.com/86qjyk > ___ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > -- Eknath Venkataramani +91-9844952442 ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers

[BangPypers] pyparsing wrong output

2010-02-12 Thread Eknath Venkataramani
I am trying to write a parser in pyparsing. Help Me. http://paste.pocoo.org/show/177078/ is the code and this is input file: http://paste.pocoo.org/show/177076/ . I get output as: * * -- Eknath Venkataramani +91-9844952442 ___ BangPypers mailing list

Re: [BangPypers] UTF-8 character

2010-01-31 Thread Eknath Venkataramani
On Sun, Jan 31, 2010 at 8:02 AM, Senthil Kumaran wrote: > > Yup. That is perfect. That emacs-style line declares to the > interpreter that the following python script uses UTF-8 encoding. You > might choose to use other encodings similarly too. > > Yeah. Thanks.

Re: [BangPypers] UTF-8 character

2010-01-29 Thread Eknath Venkataramani
On Fri, Jan 29, 2010 at 11:29 PM, Eknath Venkataramani < eknath.i...@gmail.com> wrote: > I am trying to write a program to generate a file that simply removes all > the punctuation marks from the input file. > for the usual ascii characters like .,'?!" it works. but th

[BangPypers] UTF-8 character

2010-01-29 Thread Eknath Venkataramani
I am trying to write a program to generate a file that simply removes all the punctuation marks from the input file. for the usual ascii characters like .,'?!" it works. but then when I try to do the same for the hindi fullstop (similar to |). it gives me an error saying: SyntaxError: Non-ASCII cha

[BangPypers] How should I do it?

2010-01-14 Thread Eknath Venkataramani
I have a txt file in the following format: [code] "confident" => { count => 4, trans => { "ashahvasahta" => 0.74918568, "atahmavaishahvaasa" => 0.09095465, "pahraaram\.nbha" => 0.06990729, "mailatae" => 0.02856427, "utanai" => 0.01929341, "anaa" =>