Dear all, I have several character strings with a high number of different levels. unique(x) gives me values in the range of 100-200. This creates problems as I would like to use them as predictors in a coxph model.
I therefore would like to convert each of these strings to a new string (x_new). x_new should be equal to x for the top n categories (i.e. the top n levels with the highest occurrence) and NAN elsewhere. For example, for n=3 x_new would have three levels: The three most common levels of x + NAN. Is there some convenient way of doing this? Thanks in advance, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.