28 apr 2007 kl. 07.52 skrev Kun Hong:
karl wettin wrote:
27 apr 2007 kl. 14.11 skrev Erik Hatcher:
On Apr 27, 2007, at 6:39 AM, karl wettin wrote:
27 apr 2007 kl. 12.36 skrev Erik Hatcher:
Unless someone has some other tricks I'm not aware of, that is.
I guess it would be possible to add start/stop-tokens such as ^
and $ to the indexed text: "^ the $" and place a phrase query
with 0 slop.
True true. That'd work too.
Thanks for the replies and discussion.
I think I didn't express my problems correctly. The problem is I
want to
find documents containing only the "the" token in the title field,
but not
necessarily with only one appearance. For example, if the query is
"the",
I want to find documents whose title is "the", "the the" or "the
the the".
I'm not sure if you mean that it should treat all repetative tokens
as only one token? Then you are better of using a filter when
analyzing text you insert to the index: rather than creating one
token for each the in "the the the the the the" you only create one.
You might also want to use this filter when parsing user queries. (It
will be hard to find the band 'the the'.)
If not and what you write above is all you want to match, nothing
more, nothing less, then you could do something like this:
(dry coded and untested.)
int n = 3; // the; the the; the the the
String field = "title";
String token = "the";
BooleanQuery bq = new BooleanQuery();
for (int i=0;i<n;i++) {
Term[] terms = new Term[i+2];
terms[0] = new Term(field, "^");
for (int j=0;j<i;j++) {
terms[j+1] = new Term(field, token);
}
terms[i+2] = new Term(field, "$");
bq.add(new BooleanClause(new PhraseQuery(terms, 0), Orrcurs.SHOULD);
}
I hope this helps.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]