Is there any way to find frequent phrases without knowing what you are looking for?
I could index "A B C D E" as "A B C", "B C D", "C D E" etc, but that seems kind of clunky particularly if the phrase length is large. Is there any position offset magic that will surface frequent phrases automatically? thanks ryan On 3/22/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Well, you don't index phrases, it's done for you. You should try something like the following.... Create a SpanNearQuery with your terms. Specify an appropriate slop (probably 0 assuming you want them all next to each other). Now use call getSpans and count <G>... You may have to do something with overlapping spans, but you'll need to experiment a bit to understand it. Erick On 3/22/07, Maryam <[EMAIL PROTECTED]> wrote: > > Hi, > > I know how to index terms in lucene, now I wanna see > how can I index phrases like "information retreival" > in lucene and calculate the number of times that > phrase has appeared in the document. Is there any way > to do it in Lucene? > > Thanks > > > > > ____________________________________________________________________________________ > It's here! Your new message! > Get new email alerts with the free Yahoo! Toolbar. > http://tools.search.yahoo.com/toolbar/features/mail/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]