Of course, the answer is "it depends" <G>..... This doesn't sound like a very big index, so the first approach I'd make is making the index complicated and keeping the queries as simple as possible. This assumes that you really don't care about indexing speed/size and response time for searches is what you do care about. And indexing speed won't be a problem with this size index IMO.
Lucene in Action has an example of synonym injection into the indexing stream that preserves proximity queries (SpanQueries) that you really want to look at if you haven't already <G>.... Take care that the relationship between your indexing analyzer and your search analyzers is correct, and get a copy of luke (google luke and lucene) so you can examine your index and see how queries behave. Again, if you haven't already I really, really recommend that you get a copy of Luke. I think making the indexes more complex is actually lots less work, but I don't have any real facts to back that up, FWIW. Best Erick On 9/15/06, Luis Rodrigo Aguado <[EMAIL PROTECTED]> wrote:
Hi all, I have used Lucene so far for solving toy exaples and making tutorial examples, but now I am facing my first real-world high-quality application. I need to manage around 50.000 docs, ranging from a few lines to a couple pages. I also need to handle lemmas and synonyms, and here is where my main doubts arise. I have considered two options: adding the synonyms and lemmas to the indexes and keeping the queries simple, or expanding the queries with these lemmas and synonyms and keeping the indexes simple. Is one of the two preferrable over the other? What are the benefits of each of them? Thanks in advance!