Re: Introduction to PyLucene Community and some doubts

2013-06-11 Thread Thomas Koch
Hi, I suggest you have a look at Apache TIKA: http://tika.apache.org You can easily call a "java -jar tika.jar" command via python tools like os.popen and convert files in various formats to text. There's even a python wrapper based on JCC but I'm not sure if that's still maintained: http://red

Re: Introduction to PyLucene Community and some doubts

2013-06-11 Thread Vishrut Mehta
Hello sir, Thank you for the quick reply. I want to integrate this functionality with web2py, So i would need to stick with python and Pylucene. So the method you are saying is like, extracting text from all the document using different python libraries, and then Indexing the data, then Search the