Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Peter Langfelder
On Thu, Mar 31, 2011 at 11:48 AM, Hans Ekbrand wrote: > > The variables are unordered factors, stored as integers 1:9, where > > 1 means "Full-time employment" > 2 means "Part-time employment" > 3 means "Student" > 4 means "Full-time self-employee" > ... > > Does euclidean distances make sense on

Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Hans Ekbrand
On Thu, Mar 31, 2011 at 08:48:02PM +0200, Hans Ekbrand wrote: > On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote: > > Dear Hans, > > > > clara doesn't require a distance matrix as input (and therefore > > doesn't require you to run daisy), it will work with the raw data > > matrix

Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Hans Ekbrand
On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote: > Dear Hans, > > clara doesn't require a distance matrix as input (and therefore > doesn't require you to run daisy), it will work with the raw data > matrix using > Euclidean distances implicitly. > I can't tell you whether Euclide

Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Christian Hennig
Dear Hans, clara doesn't require a distance matrix as input (and therefore doesn't require you to run daisy), it will work with the raw data matrix using Euclidean distances implicitly. I can't tell you whether Euclidean distances are appropriate in this situation (this depends on the interpre

[R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Hans Ekbrand
Dear R helpers, I have a large data set with 36 variables and about 50.000 cases. The variabels represent labour market status during 36 months, there are 8 different variable values (e.g. Full-time Employment, Student,...) Only cases with at least one change in labour market status is included i