You can have multiple languages in the same index. Just make sure that
your language identification process is consistent.
You might still get some false positives, for example, if there's a
German root that has the same letters as a French root, but means
something different, then it might still
Use the PerFieldAnalyzerWrapper and set your path (and probably name) to
KeywordAnalyzer. Reserve whatever analyzer you have for the actual
contents/meta data of the file.
Do a search on PerFieldAnalyzerWrapper in this ML for examples.
Jeff Wang
diCarta, Inc.
-Original Message-
From: pe
Weird, I was just about to comment on the fact that since posting that
my organization has decided to use Lucene, I got calls from two
commercial vendors that didn't give me the time of the day while I was
doing my comparison analysis.
Both of them referred to some random "colleague" in the busine
We're going to run into this issue when dealing with some of our larger
customers.
What we plan on doing is to separate our indexers in to separate cpus, and then
throttle them by using sleep(100) or some other number to be determined in
testing. We also plan on doing this over 2 weekends, sin
The reason we don't use Google appliance is that our company doesn't give
recommendations on OSs or Hardwares to run, it would looke a little wierd if we
say, oh, you have to buy this hardware for our search engine, but for our core
technology, feel free to deploy it anywhere you want. It just
I'm trying to upgrade our search functionality (currently, RTF/text
only, and exact phrase match only) at my company, and have run into some
concerns. Our 4 main formats are:
RTF - javax.swing looks fine, we use those classes already.
MS Word - I know that POI exists, but development on th