Hi,
I suggest you have a look at Apache TIKA: http://tika.apache.org

You can easily call a "java -jar tika.jar" command via python tools like 
os.popen and convert files in various formats to text.

There's even a python wrapper based on JCC but I'm not sure if that's still 
maintained:
http://redmine.djity.net/projects/pythontika/wiki

Regards,
Thomas
--
Am 11.06.2013 um 12:05 schrieb Vishrut Mehta <vishrut.mehta...@gmail.com>:

> Hello Everyone,
>                I am Vishrut Mehta, currently a third year students at IIIT
> Hyderabad, India. I have been contributing to Open Source since two years
> and also have contributed to organizations like E-cidadania, Sahana
> Software Foundation, Gnome, etc. I am very interested in Search engines and
> search related libraries.
> 
>               I need some help from the community, I am currently working
> on a project which deals with the follow issue - Need to search within any
> uploaded documents(like .pdf, .doc, etc) from the user    and need to
> search text or strings within those documents. Can anyone help me for this,
> it would be a great help ?!
> 
> Thanks You!
> Regards,
> -- 
> 
> *Vishrut Mehta*
> International Institute of Information Technology,
> Gachibowli,Hyderabad-500032

Reply via email to