Lucene should work quite well for this, you'll just need some infrastructure around it to get the file and extract the contents (see Lucene's Tika project). And, yes, Lucene is thread-safe, so you can index safely as you describe.

On Oct 11, 2008, at 10:22 AM, Mag Gam wrote:

Hello All,

At my university we have over 20,000 small file ranging from 20k to
500k per directory and we would like to index them. I was wondering if
Lucene is the right tool for this? The information we would like to
keep is: filename, filesize, filedate, filecontent. Also, is it
possible to run the initial index in multithreaded mode since we are
talking about many directories with similar contents?

TIA

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to