Have you tried using the cosine of the angle between two
observations as the similarity measure? If you want to account for
magnitudes, there is something called the jaccard coefficient (if I
remember correctly) that can be used.

Darin

On Mon, Jan 26, 2009 at 10:41:40AM +0100, mau...@alice.it wrote:
> I am going to try out a tentative clustering of some feature vectors.
> The range of values spanned by the three items making up the features vector 
> is quite different:
> 
> Item-1 goes roughly from 70 to 525 (integer numbers only)
> Item-2 is in-between 0 and 1 (all real numbers between 0 and 1)
> Item-3 goes from 1 to 10 (integer numbers only)
> 
> In order to spread out Item-2 even further I might try to replace Item-2 with 
> Log10(Item-2).
> 
> My concern is that, regardless the distance measure used, the item whose 
> order of magnitude is the highest may carry the highest weight in the process 
> of calculating the similarity matrix therefore fading out the influence of 
> the items with smaller variation in the resulting clusters.
> Should I normalize all feature vector elements to 1 in advance of generating 
> the similarity matrix ?
> 
> Thank you so much.
> Maura 
> 
> 
> 
> 
> 
> 
> 
> tutti i telefonini TIM!
> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to