Re: Search for docs containing only a certain word in a specified field?

karl wettin Sat, 28 Apr 2007 16:00:10 -0700


28 apr 2007 kl. 07.52 skrev Kun Hong:

karl wettin wrote:
27 apr 2007 kl. 14.11 skrev Erik Hatcher:
On Apr 27, 2007, at 6:39 AM, karl wettin wrote:
27 apr 2007 kl. 12.36 skrev Erik Hatcher:
Unless someone has some other tricks I'm not aware of, that is.
I guess it would be possible to add start/stop-tokens such as ^and $ to the indexed text: "^ the $" and place a phrase querywith 0 slop.
True true.   That'd work too.
Thanks for the replies and discussion.
I think I didn't express my problems correctly. The problem is Iwant tofind documents containing only the "the" token in the title field,but notnecessarily with only one appearance. For example, if the query is"the",I want to find documents whose title is "the", "the the" or "thethe the".

I'm not sure if you mean that it should treat all repetative tokensas only one token? Then you are better of using a filter whenanalyzing text you insert to the index: rather than creating onetoken for each the in "the the the the the the" you only create one.You might also want to use this filter when parsing user queries. (Itwill be hard to find the band 'the the'.)

If not and what you write above is all you want to match, nothingmore, nothing less, then you could do something like this:


(dry coded and untested.)

int n = 3; // the; the the; the the the
String field = "title";
String token = "the";
BooleanQuery bq = new BooleanQuery();
for (int i=0;i<n;i++) {
  Term[] terms = new Term[i+2];
  terms[0] = new Term(field, "^");
  for (int j=0;j<i;j++) {
    terms[j+1] = new Term(field, token);
  }
  terms[i+2] = new Term(field, "$");
  bq.add(new BooleanClause(new PhraseQuery(terms, 0), Orrcurs.SHOULD);
}


I hope this helps.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search for docs containing only a certain word in a specified field?

Reply via email to