Lucene should work quite well for this, you'll just need some
infrastructure around it to get the file and extract the contents (see
Lucene's Tika project). And, yes, Lucene is thread-safe, so you can
index safely as you describe.
On Oct 11, 2008, at 10:22 AM, Mag Gam wrote:
Hello All,
Hello All,
At my university we have over 20,000 small file ranging from 20k to
500k per directory and we would like to index them. I was wondering if
Lucene is the right tool for this? The information we would like to
keep is: filename, filesize, filedate, filecontent. Also, is it
possible to run