30 apr 2007 kl. 02.05 skrev Kun Hong:
I'm not sure if you mean that it should treat all repetative
tokens as only one token? Then you are better of using a filter
when analyzing text you insert to the index: rather than creating
one token for each the in "the the the the the the" you only
create one. You might also want to use this filter when parsing
user queries. (It will be hard to find the band 'the the'.)
I can't just use filters because I have to cater for other titles
that are not just stop words, which
should be analyzed normally. (I know this requirement is a bit fussy).
You might want to consider using two fields. One with "normal
analysis" and one that filters out everything by titles that contain
nothing but one word repeating over and over again. Then it would be
sufficient with a single TermQuery to match, and no more need to hack
in the extra start- and stop tokens.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]