Susan wrote:

If my data looks like this:

word 1: 100    101     101    102    102     102    106    106
word 2: 101    104     106    110    113     129    131    148
word 3: 101    153     175    180    381
word 4: 106    110     113    122    131     137    142    148
word 5: 120    165     169

where word 1,2,3,4,5 represent different words, numbers represent
different attributes of words.

How can I calculate similarity between words?

I am assuming that the numbers are independent, so that 101 and 102 are as much related as 101 and 175. That is probably a bad assumption, because I see that an attribute can apply to the same word multiple times.

1. Per word, concatenate the chr() of the attribute values, to make a string. 2. Calculate the Levenshtein distance (or edit distance) between the strings.

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to