Re: [R] Hints for Data Clustering

Jean V Adams Fri, 02 Sep 2011 10:58:55 -0700

Look at the function daisy in the package cluster.

require(cluster)
?daisy


Jean

Lorenzo Isella wrote on 09/02/2011 11:50:04 AM:
> 
> Dear All,
> I will be confronted (relatively soon) with the following problem:
> given a set of known statistical indicators {s_i} , i=1,2...N for a N 
> countries I would like to be able to do some data clustering i.e. 
> determining the best way to partition the N countries according to their 

> known properties, encoded by the {s_i} set of indicators for those 
> countries.
> Some properties of these countries may be categorical or anyway 
> non-numerical variables (e.g. the fact of belonging/not belonging to a 
> certain group; joining/not joining a certain treaty etc...). I have seen 

> some data clustering examples, but without categorical variables and I 
> wonder if this is an inherent limitation of the methodology (on the top 
> of my head, I would not know how to define the distance between 
> non-numerical variables). Any suggestions about the general methodology 
> and R packages/code snippets is really appreciated.
> And also: do the units in which I express a statistical indicator play a 

> role? For instance: for 2 given countries I could have the average age 
> of the population, the average life expectancy and the average income 
> per year in thousands of dollars. This would give rise e.g. to 
> (40,72,26) and (44,75,36), but if I measure the average income in 
> dollars, then I would get (40,72,26000) (44,75,36000). Would the units 
> that I choose for an indicator impact on the clustering results? They 
> should not, in my view, since the income does not change whichever way I 

> express it, but I am not sure about the algorithm results.
> Many thanks
> 
> Lorenzo

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hints for Data Clustering

Reply via email to