23 mar 2007 kl. 04.25 skrev Ryan McKinley:
Is there any way to find frequent phrases without knowing what you are
looking for?
I think you are looking for association rules. Try searching for
Levelwise-Scan.
Weka contains GPLed Java code.
Cite seer is your best friend for whitepapers. http://
citeseer.ist.psu.edu/cs
--
karl
I could index "A B C D E" as "A B C", "B C D", "C D E" etc, but that
seems kind of clunky particularly if the phrase length is large. Is
there any position offset magic that will surface frequent phrases
automatically?
thanks
ryan
On 3/22/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Well, you don't index phrases, it's done for you. You should try
something like the following....
Create a SpanNearQuery with your terms. Specify an appropriate
slop (probably 0 assuming you want them all next to each other).
Now use call getSpans and count <G>... You may have to do
something with overlapping spans, but you'll need to experiment
a bit to understand it.
Erick
On 3/22/07, Maryam <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I know how to index terms in lucene, now I wanna see
> how can I index phrases like "information retreival"
> in lucene and calculate the number of times that
> phrase has appeared in the document. Is there any way
> to do it in Lucene?
>
> Thanks
>
>
>
>
>
_____________________________________________________________________
_______________
> It's here! Your new message!
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]