On Thu, Mar 31, 2011 at 08:48:02PM +0200, Hans Ekbrand wrote: > On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote: > > Dear Hans, > > > > clara doesn't require a distance matrix as input (and therefore > > doesn't require you to run daisy), it will work with the raw data > > matrix using > > Euclidean distances implicitly. > > I can't tell you whether Euclidean distances are appropriate in this > > situation (this depends on the interpretation and variables and > > particularly on how they are scaled), but they may be fine at least > > after some transformation and standardisation of your variables. > > The variables are unordered factors, stored as integers 1:9, where > > 1 means "Full-time employment" > 2 means "Part-time employment" > 3 means "Student" > 4 means "Full-time self-employee" > ... > > Does euclidean distances make sense on unordered factors coded as > integers?
To be clear, here is an extract > my.df.full[900:910, 16:19] PL210F.first.year PL210G.first.year PL210H.first.year PL210I.first.year 900 2 2 1 2 901 1 1 1 1 902 1 1 1 1 903 2 2 2 2 904 1 1 1 1 905 2 2 2 2 906 7 8 2 7 907 5 5 5 5 908 1 1 1 1 909 1 1 1 1 910 1 1 1 1 > class(my.df.full[,16]) [1] "integer" ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.