I ran into some strange behavior in R when trying to assign a treatment to rows in a data frame. I'm wondering whether any R experts can explain what's going on.
First, let's assign a treatment to 3 out of 10 rows as follows. > df <- data.frame(unit = 1:10) > df$treated <- FALSE > > s <- sample(nrow(df), 3) > df[s,]$treated <- TRUE > > df unit treated 1 1 FALSE 2 2 TRUE 3 3 FALSE 4 4 FALSE 5 5 TRUE 6 6 FALSE 7 7 TRUE 8 8 FALSE 9 9 FALSE 10 10 FALSE This is as expected. Now we'll just skip the intermediate step of saving the sampled indices, and apply the treatment directly as follows. > df <- data.frame(unit = 1:10) > df$treated <- FALSE > > df[sample(nrow(df), 3),]$treated <- TRUE > > df unit treated 1 6 TRUE 2 2 FALSE 3 3 FALSE 4 9 TRUE 5 5 FALSE 6 6 FALSE 7 7 FALSE 8 5 TRUE 9 9 FALSE 10 10 FALSE Now the data frame still has 10 rows with 3 assigned to the treatment. But the units are garbled. Units 1 and 4 have disappeared, for instance, and there are duplicates for 6 and 9, one assigned to treatment and the other to control. Why would this happen? Thanks, Sebastien [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.