Intution behind adding n-grams is to boost naturally occurring larger phrases versus using phrase queries. For example, if I am searching for "united states of america", I want the search results to return the documents ordered as follows
Rank 1 - Documents containing all the words occurring together Rank 2 - Documents containing maximum number of words in the same sentence Rank 3 - Documents containing all the words but some might appear in the same sentence some may not Rank 4 - Documents containig atleast one or two words If we have a n-gram index, most probably document talking about "united states" gets preference over document containing "united" and "states" seperately. If I am correct, this can be achieved without using phrase queries. I am not sure if there is a better way to achieve the same effect. Thanks, Rajesh -----Original Message----- From: Andy Roberts [mailto:[EMAIL PROTECTED] Sent: Monday, July 18, 2005 5:56 PM To: java-user@lucene.apache.org Subject: Re: n-gram indexing On Monday 18 Jul 2005 21:27, Rajesh Munavalli wrote: > At what point do I add n-grams? Does the order in which I add n-grams > affect exact phrase queries later? My questions are > > (1) Should I add all the 1-grams followed by 2-grams followed by > 3-grams..etc sentence by sentence OR > > (2) Add all the 1 grams of entire document first before starting > 2-grams for the entire document? > > What is the general accepted notion of adding n-grams of a document? > > thanks, > > Rajesh I can't see any real advantage of storing n-grams explicitly. Just index the document and use phrase queries. Order is significant with phrase queries if I recall correctly, although you can use SpanNearQueries to look for unordered ngrams, although I don't know why you would want to! Perhaps if you explain a little more about what you are trying to achieve more generally, we can confirm that you don't need to mess with explicit indexing of indexing. Andy --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]