Hi Tibor, No, you are misunderstanding the source of the problem. It has nothing to do with factors.
Instead, it has to do with the inability of a vector to hold more than one class. You are using rbind() to add a new row to your data frame, but that vector is being coerced to character. That's what is forcing your numeric column to become character: you're adding a character to it. > c("in", "V>N", round(runif(1, 7000, 16000), 0)) [1] "in" "V>N" "15709" It has nothing whatsoever to do with factors or factor levels, and would occur if you were adding it to a data frame with character values. If you want to mix types, you cannot use a vector. c2 <- data.frame(P = "in", ANSWER = "V>N", RT = round(runif(1, 7000, 16000), 0)) > str(rbind(df, c2)) 'data.frame': 7 obs. of 3 variables: $ P : Factor w/ 4 levels "mit","mittels",..: 2 1 2 3 1 1 4 $ ANSWER: Factor w/ 3 levels "OBJ>PP","PP>OBJ",..: 2 2 2 2 1 1 3 $ RT : num 10867 14808 11600 15881 8984 ... Sarah On Tue, Sep 20, 2022 at 8:45 AM Tibor Kiss via R-help <r-help@r-project.org> wrote: > > Hi, > > this is a misunderstanding of my question. I wasn’t worried about invalid > factor levels that produce NA. My question was why a column changes its > class, which I thought was a side effect. If you add a vector containing one > character string, the class of the whole vector becomes _chr_. And after this > element has been added to a column, we have two NAs for the column which are > factors, and a character string, which is responsible for the change of a > numerical vector into a character string vector (see ?c, where you find: "The > output type is determined from the highest type of the components in the > hierarchy NULL < raw < logical < integer < double < complex < character < > list < expression.“). > > > Best > > > Tibor > > > > > Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron <teb...@ufl.edu>: > > > > In your example code, the variable remains a class factor, and all entries > > are valid. The variables will behave as expected given the factor levels in > > the original dataframe. > > > > (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of > > error messages warning me that I was bad. > > What you get is NA for "not available", or "not appropriate" or a missing > > value. You gave the system an invalid factor level so it was entered as > > missing. If you get data that has a new factor level, you need to tell R to > > expect a new factor level first. > > > > levels(f1) <- c(levels(f1),"New Level") > > levels(f1) <- c(levels(f1),c("NL1","NL2")) > > > > > > Tim > > -----Original Message----- > > From: R-help <r-help-boun...@r-project.org> On Behalf Of Tibor Kiss via > > R-help > > Sent: Monday, September 19, 2022 6:11 AM > > To: r-help@r-project.org > > Subject: [R] Question concerning side effects of treating invalid factor > > levels > > > > [External Email] > > > > Dear List members, > > > > I have tried now for several times to find out about a side effect of > > treating invalid factor levels, but did not find an answer. Various answers > > on stackexchange etc. produce the stuff that irritates me without even > > mentioning it. > > So I am asking the list (apologies if this has been treated in the past). > > > > If you add an invalid factor level to a column in a data frame, this has > > the side effect of turning a numerical column into a column with character > > strings. Here is a simple example: > > > >> df <- data.frame( > > P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), > > ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), > > RT = round(runif(6, 7000, 16000), 0)) > > > >> str(df) > > 'data.frame': 6 obs. of 3 variables: > > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 > > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 > > $ RT : num 11157 13719 14388 14527 14686 .. > > > >> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) > > > >> str(df) > > 'data.frame': 7 obs. of 3 variables: > > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA > > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA > > $ RT : chr "11478" "15819" "8305" "8852" ... > > > > You see that RT has changed from _num_ to _chr_ as a side effect of adding > > the invalid factor level as NA. I would appreciate understanding what the > > purpose of the type coercion is. > > > > Thanks in advance > > > > > > Tibor > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3D&reserved=0 > > PLEASE do read the posting guide > > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=AP%2B4fa5pvbGr3IfwdiQvjXwkOdY90CIWIWWWmpIHH7w%3D&reserved=0 > > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee (she/her) http://www.numberwright.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.