RE: Indexing with SnowballAnalyzer and multiple languages in a single index

2006-04-20 Thread jwang
You can have multiple languages in the same index. Just make sure that your language identification process is consistent. You might still get some false positives, for example, if there's a German root that has the same letters as a French root, but means something different, then it might still

RE: problem Indexing path of files

2006-04-04 Thread jwang
Use the PerFieldAnalyzerWrapper and set your path (and probably name) to KeywordAnalyzer. Reserve whatever analyzer you have for the actual contents/meta data of the file. Do a search on PerFieldAnalyzerWrapper in this ML for examples. Jeff Wang diCarta, Inc. -Original Message- From: pe

Commercial vendors monitoring this ML? was: Lucene Performance Issues

2006-03-28 Thread jwang
Weird, I was just about to comment on the fact that since posting that my organization has decided to use Lucene, I got calls from two commercial vendors that didn't give me the time of the day while I was doing my comparison analysis. Both of them referred to some random "colleague" in the busine

Re: Lucene CPU Utilization

2006-02-20 Thread jwang
We're going to run into this issue when dealing with some of our larger customers. What we plan on doing is to separate our indexers in to separate cpus, and then throttle them by using sleep(100) or some other number to be determined in testing. We also plan on doing this over 2 weekends, sin

Re: Build vs. Buy?

2006-02-10 Thread jwang
The reason we don't use Google appliance is that our company doesn't give recommendations on OSs or Hardwares to run, it would looke a little wierd if we say, oh, you have to buy this hardware for our search engine, but for our core technology, feel free to deploy it anywhere you want. It just

Build vs. Buy?

2006-02-08 Thread jwang
I'm trying to upgrade our search functionality (currently, RTF/text only, and exact phrase match only) at my company, and have run into some concerns. Our 4 main formats are: RTF - javax.swing looks fine, we use those classes already. MS Word - I know that POI exists, but development on th