abanero wrote: > > Do you know something like “knn1” that works with categorical variables > too? > Do you have any suggestion? > There are surely plenty of clustering algorithms around that do not require a vector space structure on the inputs (like KNN does). I think agglomerative clustering would solve the problem as well as a kernel-based clustering (assuming that you have a way to positive semi-definite measure of the similarity of two samples). Probably the simplest way is Affinity Propagation (http://www.psi.toronto.edu/index.php?q=affinity%20propagation; see CRAN package "apcluster" I have co-developed). All you need is a way of measuring the similarity of samples which is straightforward both for numerical and categorical variables - as well as for mixtures of both (the choice of the similarity measures and how to aggregate the different variables is left to you, of course). Your final "classification" task can be accomplished simply by assigning the new sample to the cluster whose exemplar is most similar.
Joris Meys wrote: > > Not a direct answer, but from your description it looks like you are > better > of with supervised classification algorithms instead of unsupervised > clustering. > If you say that this is a purely supervised task that can be solved without clustering, I disagree. abanero does not mention any class labels. So it seems to me that it is indeed necessary to do unsupervised clustering first. However, I agree that the second task of assigning new samples to clusters/classes/whatever can also be solved by almost any supervised technique if samples are labeled according to their cluster membership first. Cheers, Ulrich -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232902.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.