Hi Martin, many thanks for your answer and your broad explanation.
I am a newbie to "R" and got help on this list and thought I could give something back what looked OK to me. regarding 0) You're right, it's pseudo code. I assumed that anybody on the list would be able to adapt the code to their needs so that it worked. Next time I will post runnable code. regarding 1) Your right: "[, i]" is missing. My fault. Sorry. regarding 3) I got your point and will do better in the future. One question: What books do you recommend to read to get to know "R" better? Kind regards Georg Von: Martin Maechler <maech...@stat.math.ethz.ch> An: <g.maub...@weinwolf.de>, Kopie: Carl Sutton <suttonc...@ymail.com>, "r-help@r-project.org" <r-help@r-project.org> Datum: 04.05.2016 09:05 Betreff: [R] Antwort: Re: selecting columns from a data frame or data table by type, ie, numeric, integer >>>>> <g.maub...@weinwolf.de> >>>>> on Wed, 4 May 2016 08:30:50 +0200 writes: > Hi All, > Hi Carl, > > I am not sure if this is useful to you, but I followed your conversation > and thought of you when I read this: > > for (i in 1:ncol(dataset)) { > if(class(dataset) == "character|numeric|factor|or whatsoever") { > dataset[, i] <- as.factor(dataset[, i]) > } > } Ouch -- so many problems in such a short piece of R code !!! > Source: Zumel, Nina / Mount, John: Practical Data Science with R, Manning > Publications: Shelter Island, 2014, Chapter 2: Loading data into R, p. 25 Sorry, but after reading the above, I'd strongly recommend getting better books about R... {{maybe do not take those containing "data science" ;-)}} Compared to the nice and efficient solution of Bill Dunlap, the above is really bad-bad-bad in at least four ways : 0) They way you write it above, you cannot use it, <string> == "variant1|variant2|..." is pseudocode and does not really work 1) Note the missing "[, i]" in the 2nd line: It should be if(class(dataset[, i]) ... 2) A for loop changing each column at a time is really slow for largish data sets 3) [last but not at all least!] Please ... many of you readers, do learn: Using checks such as if ( class(x) == "numeric" ) are (almost) always wrong by design !!! Instead you really should (almost) always use if(inherits(x, "numeric")) Why? Because classes in R (S3 or S4) can *extend* other classes. Example: Many of you know that after fm <- glm(...) class(fm) is c("glm", "lm") and so > if(class(fm) == "lm") + "yes" Warning message: In if (class(fm) == "lm") "yes" : the condition has length > 1 and only the first element will be used Similarly, in your case y <- 1:10 class(y) <- c("myNumber", "numeric") when that 'y' is a column in your data frame, the test for if(class(dataset[,i]) == "numeric") will *not* work but actually produce the above warning. However, one could als have had Num <- setClass("Num", contains="numeric") N <- Num(1:10) > Num <- setClass("Num", contains="numeric") > N <- Num(1:10) > N An object of class "Num" [1] 1 2 3 4 5 6 7 8 9 10 > if(class(N) == "numeric") "yes" else "no" [1] "no" > I hope that many of the readers --- including *MANY* authors of R packages !! --- have understood the above and will fix their R code -- and even more their books where applicable !! Martin Maechler, ETH Zurich & R Core Team > > This way you can select variables of a certain class only and do > transformations. I found that this approach is not applicable if used with > statistical functions like head(). Transformations worked fine for me. > > I found reading the above given source worthwile. > > Kind regards > > Georg > > PS: I am not related to the above given authors. I am just a reader > reporting on - at least to me - a valuable ressource. > > > > Von: Carl Sutton via R-help <r-help@r-project.org> > An: William Dunlap <wdun...@tibco.com>, > Kopie: "r-help@r-project.org" <r-help@r-project.org> > Datum: 29.04.2016 22:08 > Betreff: Re: [R] selecting columns from a data frame or data table > by type, ie, numeric, integer > Gesendet von: "R-help" <r-help-boun...@r-project.org> > > > > Thank you Bill Dunlap. So simple I never tried that approach. Tried > dozens of others though, read manuals till I was getting headaches, and of > course the answer was simple when one is competent. Learning, its a > struggle, but slowly getting there. > Thanks again > Carl Sutton CPA > > > On Friday, April 29, 2016 10:50 AM, William Dunlap <wdun...@tibco.com> > wrote: > > > > > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ] a c1 1 1.12 2 > 1.0...10 10 0.2 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help > <r-help@r-project.org> wrote: > > Good morning RGuru's > I have a data frame of 575 columns. I want to extract only those columns > that are numeric(double) or integer to do some machine learning with. I > have searched the web for a couple of days (off and on) and have not found > anything that shows how to do this. Lots of ways to extract rows, but > not columns. I have attempted to use "(x == y)" indices extraction method > but that threw error that == was for atomic vectors and lists, and I was > doing this on a data frame. > > My test code is below > > # a technique to get column classes > library(data.table) > a <- 1:10 > b <- c("a","b","c","d","e","f","g","h","i","j") > c <- seq(1.1, .2, length = 10) > dt1 <- data.table(a,b,c) > str(dt1) > col.classes <- sapply(dt1, class) > head(col.classes) > dt2 <- subset(dt1, typeof = "double" | "numeric") > str(dt2) > dt2 # not subset > dt2 <- dt1[, list(typeof = "double")] > str(dt2) > class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)] > class_data > sum(class_data) > typeof(class_data) > names(class_data) > str(class_data) > Any help is appreciated > Carl Sutton CPA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.