Hi, On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander <ckale...@ncsu.edu> wrote: > Hi! This is my first time posting. I've read the general rules and > guidelines, but please bear with me if I make some fatal error in > posting. Anyway, I have a continuous response and 29 predictors made > up of continuous variables and nominal and ordinal categorical > variables. I'd like to do lasso on these, but I get an error. The way > I am using "lars" doesn't allow for the factors. Is there a special > option or some other method in order to do lasso with cat. variables? > > Here is and example (considering ordinal variables as just nominal): > > set.seed(1) > Y <- rnorm(10,0,1) > X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) > X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) > X3 <- sample(x=30:55, size=10, replace=TRUE) # think age > X4 <- rchisq(10, df=4, ncp=0) > X <- data.frame(X1,X2,X3,X4) > >> str(X) > 'data.frame': 10 obs. of 4 variables: > $ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 > $ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 > $ X3: int 51 46 50 44 43 50 30 42 49 48 > $ X4: num 2.86 1.55 1.94 2.45 2.75 ... > > > I'd like to do: > obj <- lars(x=X, y=Y, type = "lasso") > > Instead, what I have been doing is converting all data to continuous > but I think this is really bad!
Yeah, it is. Check out the "Categorical Predictor Variables" section here for a way to handle such predictor vars: http://www.psychstat.missouristate.edu/multibook/mlt08m.html HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.