Hi,
I suggest you have a look at Apache TIKA: http://tika.apache.org
You can easily call a "java -jar tika.jar" command via python tools like
os.popen and convert files in various formats to text.
There's even a python wrapper based on JCC but I'm not sure if that's still
maintained:
http://red
Hello sir,
Thank you for the quick reply. I want to integrate this functionality with
web2py, So i would need to stick with python and Pylucene. So the method
you are saying is like, extracting text from all the document using
different python libraries, and then Indexing the data, then Search the
Hello Everyone,
I am Vishrut Mehta, currently a third year students at IIIT
Hyderabad, India. I have been contributing to Open Source since two years
and also have contributed to organizations like E-cidadania, Sahana
Software Foundation, Gnome, etc. I am very interested in Search e