23 mar 2007 kl. 04.25 skrev Ryan McKinley:

Is there any way to find frequent phrases without knowing what you are
looking for?

I think you are looking for association rules. Try searching for Levelwise-Scan.

Weka contains GPLed Java code.
Cite seer is your best friend for whitepapers. http:// citeseer.ist.psu.edu/cs


--
karl



I could index "A B C D E" as "A B C", "B C D", "C D E" etc, but that
seems kind of clunky particularly if the phrase length is large.  Is
there any position offset magic that will surface frequent phrases
automatically?

thanks
ryan


On 3/22/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Well, you don't index phrases, it's done for you. You should try
something like the following....

Create a SpanNearQuery with your terms. Specify an appropriate
slop (probably 0 assuming you want them all next to each other).

Now use call getSpans and count <G>... You may have to do
something with overlapping spans, but you'll need to experiment
a bit to understand it.

Erick

On 3/22/07, Maryam <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I know how to index terms in lucene, now I wanna see
> how can I index phrases like "information retreival"
> in lucene and calculate the number of times that
> phrase has appeared in the document. Is there any way
> to do it in Lucene?
>
> Thanks
>
>
>
>
> _____________________________________________________________________ _______________
> It's here! Your new message!
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to