Look at the function daisy in the package cluster. require(cluster) ?daisy
Jean Lorenzo Isella wrote on 09/02/2011 11:50:04 AM: > > Dear All, > I will be confronted (relatively soon) with the following problem: > given a set of known statistical indicators {s_i} , i=1,2...N for a N > countries I would like to be able to do some data clustering i.e. > determining the best way to partition the N countries according to their > known properties, encoded by the {s_i} set of indicators for those > countries. > Some properties of these countries may be categorical or anyway > non-numerical variables (e.g. the fact of belonging/not belonging to a > certain group; joining/not joining a certain treaty etc...). I have seen > some data clustering examples, but without categorical variables and I > wonder if this is an inherent limitation of the methodology (on the top > of my head, I would not know how to define the distance between > non-numerical variables). Any suggestions about the general methodology > and R packages/code snippets is really appreciated. > And also: do the units in which I express a statistical indicator play a > role? For instance: for 2 given countries I could have the average age > of the population, the average life expectancy and the average income > per year in thousands of dollars. This would give rise e.g. to > (40,72,26) and (44,75,36), but if I measure the average income in > dollars, then I would get (40,72,26000) (44,75,36000). Would the units > that I choose for an indicator impact on the clustering results? They > should not, in my view, since the income does not change whichever way I > express it, but I am not sure about the algorithm results. > Many thanks > > Lorenzo [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.