On Jun 22, 12:48 pm, [EMAIL PROTECTED] (Andrej Kastrin) wrote:

> I wrote a simple sql querry to count co-occurrences between words but it
> performs very very slow on large datasets. So, it's time to do it with
> Perl. I need just a short tip to start out: which structure to use to
> count all possible occurrences between letters (e.g. A, B and C) under
> the particular document number. My dataset looks like following:
>
> 1 A
> 1 B
> 1 C
> 1 B
> 2 A
> 2 A
> 2 B
> 2 C
> etc. till doc. number 100.000
>
> The result file should than be similar to:
> A B 4   ### 2 co-occurrences under doc. number 1 + 2 co-occurrences
> under doc. number 2
> A C 3   ### 1 co-occurrence under doc. number 1 + 2 co-occurrences under
> doc. number 2
> B C 3   ### 2 co-occurrences under doc. number 1 + 1 co-occurrence under
> doc. number 2

Maybe I'm just a little slow on the uptake, but I don't at all
understand the correlation between your sample input and sample
output.  Where did "A B 4" come from, and what does it mean for "2 co-
ocurrences" under doc number 1?  What is a co-occurrence? I see one
instance of "1 A", and two instances of "1 B".  How does that
translate to "2 co-ocurrences" of "A B"?

Can you explain your desired goal a little better?

Paul Lalli


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to