Your intuition is right, but i can't think of any reason why you need to add the n-grams at indexing time -- or why using phrase queries would be a bad thing in this case. When you get a multiword query, construct the n-grams of the query words as multiple phrase queries and search for a BooleanQuery of all those phrases (with bigger boosts given to longer phrases)
Using your example of "united states of america" generate a query that looks something like... united states america "united states"^10 "states america"^10 "united states america"^100 : Intution behind adding n-grams is to boost naturally occurring larger : phrases versus using phrase queries. For example, if I am searching for : "united states of america", I want the search results to return the : documents ordered as follows : : Rank 1 - Documents containing all the words occurring together : Rank 2 - Documents containing maximum number of words in the same : sentence : Rank 3 - Documents containing all the words but some might appear in the : same sentence some may not : Rank 4 - Documents containig atleast one or two words : : If we have a n-gram index, most probably document talking about "united : states" gets preference over document containing "united" and "states" : seperately. If I am correct, this can be achieved without using phrase : queries. I am not sure if there is a better way to achieve the same : effect. : : Thanks, : : Rajesh : : : -----Original Message----- : From: Andy Roberts [mailto:[EMAIL PROTECTED] : Sent: Monday, July 18, 2005 5:56 PM : To: java-user@lucene.apache.org : Subject: Re: n-gram indexing : : On Monday 18 Jul 2005 21:27, Rajesh Munavalli wrote: : > At what point do I add n-grams? Does the order in which I add n-grams : > affect exact phrase queries later? My questions are : > : > (1) Should I add all the 1-grams followed by 2-grams followed by : > 3-grams..etc sentence by sentence OR : > : > (2) Add all the 1 grams of entire document first before starting : > 2-grams for the entire document? : > : > What is the general accepted notion of adding n-grams of a document? : > : > thanks, : > : > Rajesh : : I can't see any real advantage of storing n-grams explicitly. Just index : the document and use phrase queries. Order is significant with phrase : queries if I recall correctly, although you can use SpanNearQueries to : look for unordered ngrams, although I don't know why you would want to! : : Perhaps if you explain a little more about what you are trying to : achieve more generally, we can confirm that you don't need to mess with : explicit indexing of indexing. : : Andy : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]