28 apr 2007 kl. 07.52 skrev Kun Hong:

karl wettin wrote:

27 apr 2007 kl. 14.11 skrev Erik Hatcher:


On Apr 27, 2007, at 6:39 AM, karl wettin wrote:
27 apr 2007 kl. 12.36 skrev Erik Hatcher:

Unless someone has some other tricks I'm not aware of, that is.

I guess it would be possible to add start/stop-tokens such as ^ and $ to the indexed text: "^ the $" and place a phrase query with 0 slop.

True true.   That'd work too.

Thanks for the replies and discussion.

I think I didn't express my problems correctly. The problem is I want to find documents containing only the "the" token in the title field, but not necessarily with only one appearance. For example, if the query is "the", I want to find documents whose title is "the", "the the" or "the the the".

I'm not sure if you mean that it should treat all repetative tokens as only one token? Then you are better of using a filter when analyzing text you insert to the index: rather than creating one token for each the in "the the the the the the" you only create one. You might also want to use this filter when parsing user queries. (It will be hard to find the band 'the the'.)

If not and what you write above is all you want to match, nothing more, nothing less, then you could do something like this:

(dry coded and untested.)

int n = 3; // the; the the; the the the
String field = "title";
String token = "the";
BooleanQuery bq = new BooleanQuery();
for (int i=0;i<n;i++) {
  Term[] terms = new Term[i+2];
  terms[0] = new Term(field, "^");
  for (int j=0;j<i;j++) {
    terms[j+1] = new Term(field, token);
  }
  terms[i+2] = new Term(field, "$");
  bq.add(new BooleanClause(new PhraseQuery(terms, 0), Orrcurs.SHOULD);
}


I hope this helps.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to