You can abuse the S4 class system to do this. setClass("Size") # no representation, no prototype setAs(from="character", to="Size", # nothing but a coercion method function(from){ ret <- factor(from, levels=c("Small","Medium","Large"), ordered=TRUE) class(ret) <- c("Size", class(ret)) ret }) z <- read.table(colClasses=c("integer", "Size"), text="7 Medium\n5 Large\n3 Large") dput(z) #structure(list(V1 = c(7L, 5L, 3L), V2 = structure(c(2L, 3L, 3L #), .Label = c("Small", "Medium", "Large"), class = c("Size", #"ordered", "factor"))), class = "data.frame", row.names = c(NA, #-3L))
I wonder if this behavior is intended or if there is a more sanctioned way to get read.table(colClasses=...) to make a factor with a specified set of levels. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Dec 19, 2018 at 3:19 AM Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 19/12/2018 5:58 AM, Luigi Marongiu wrote: > > Dear all, > > I have a data frame with character values where each character is a > > level; however, not all columns of the data frame have the same > > characters thus, when generating the data frame with stringsAsFactors > > = TRUE, the levels are different for each column. > > Is there a way to provide a single vector of levels and assign the > > characters so that they match such vector? > > Is there a way to do that not only when setting the data frame but > > also when reading data from a file with read.table()? > > > > For instance, I have: > > column_1 = c("A", "B", "C", "D", "E") > > column_2 = c("B", "B", "C", "E", "E") > > column_3 = c("C", "C", "D", "D", "C") > > my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = > TRUE) > >> str(my.data) > > 'data.frame': 5 obs. of 3 variables: > > $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5 > > $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3 > > $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1 > > > > Thank you > > > > I don't think read.table() can do it for you automatically. To do it > yourself, you need to get a vector of the levels. If you know this, > just assign it to a variable; if you don't know it, compute it as > > thelevels <- unique(unlist(lapply(my.data, levels))) > > Then set the levels of each column to thelevels: > > my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x) > <- thelevels; x})) > > Duncan Murdoch > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.