Is there any way to find frequent phrases without knowing what you are
looking for?

I could index "A B C D E" as "A B C", "B C D", "C D E" etc, but that
seems kind of clunky particularly if the phrase length is large.  Is
there any position offset magic that will surface frequent phrases
automatically?

thanks
ryan


On 3/22/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Well, you don't index phrases, it's done for you. You should try
something like the following....

Create a SpanNearQuery with your terms. Specify an appropriate
slop (probably 0 assuming you want them all next to each other).

Now use call getSpans and count <G>... You may have to do
something with overlapping spans, but you'll need to experiment
a bit to understand it.

Erick

On 3/22/07, Maryam <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I know how to index terms in lucene, now I wanna see
> how can I index phrases like "information retreival"
> in lucene and calculate the number of times that
> phrase has appeared in the document. Is there any way
> to do it in Lucene?
>
> Thanks
>
>
>
>
> 
____________________________________________________________________________________
> It's here! Your new message!
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to