[R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread Sébastien Lahaie
I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.

First, let's assign a treatment to 3 out of 10 rows as follows.

> df <- data.frame(unit = 1:10)

> df$treated <- FALSE

>

> s <- sample(nrow(df), 3)

> df[s,]$treated <- TRUE

>

> df

   unit treated

1 1   FALSE

2 2TRUE

3 3   FALSE

4 4   FALSE

5 5TRUE

6 6   FALSE

7 7TRUE

8 8   FALSE

9 9   FALSE

10   10   FALSE

This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.

> df <- data.frame(unit = 1:10)

> df$treated <- FALSE

>

> df[sample(nrow(df), 3),]$treated <- TRUE

>

> df

   unit treated

1 6TRUE

2 2   FALSE

3 3   FALSE

4 9TRUE

5 5   FALSE

6 6   FALSE

7 7   FALSE

8 5TRUE

9 9   FALSE

10   10   FALSE

Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?

Thanks,
Sebastien

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior when sampling rows of a data frame

2020-06-19 Thread Sébastien Lahaie
Thank you all for the responses, these are the insights I was hoping for.
There are many ways to get this right, and I happened to run into one that
has a glitch. I see from Luke's explanation how the strange output came
about. Glad to hear that this bug/behavior is already known.

On Fri, Jun 19, 2020 at 7:04 PM Daniel Nordlund 
wrote:

> On 6/19/2020 5:49 AM, Sébastien Lahaie wrote:
> > I ran into some strange behavior in R when trying to assign a treatment
> to
> > rows in a data frame. I'm wondering whether any R experts can explain
> > what's going on.
> >
> > First, let's assign a treatment to 3 out of 10 rows as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > s <- sample(nrow(df), 3)
> > df[s,]$treated <- TRUE
> > df
> > unit treated
> > 1 1   FALSE
> > 2 2TRUE
> > 3 3   FALSE
> > 4 4   FALSE
> > 5 5TRUE
> > 6 6   FALSE
> > 7 7TRUE
> > 8 8   FALSE
> > 9 9   FALSE
> > 10   10   FALSE
> >
> > This is as expected. Now we'll just skip the intermediate step of saving
> > the sampled indices, and apply the treatment directly as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > df[sample(nrow(df), 3),]$treated <- TRUE
> > df
> > unit treated
> > 1 6TRUE
> > 2 2   FALSE
> > 3 3   FALSE
> > 4 9TRUE
> > 5 5   FALSE
> > 6 6   FALSE
> > 7 7   FALSE
> > 8 5TRUE
> > 9 9   FALSE
> > 10   10   FALSE
> >
> > Now the data frame still has 10 rows with 3 assigned to the treatment.
> But
> > the units are garbled. Units 1 and 4 have disappeared, for instance, and
> > there are duplicates for 6 and 9, one assigned to treatment and the other
> > to control. Why would this happen?
> >
> > Thanks,
> > Sebastien
> >
> Sébastien,
>
> You have received good explanations of what is going on with your code.
> I think you can get what you want by making a simple modification of
> your treatment assignment statement. At least it works for me.
>
> df[sample(nrow(df),3), 'treated'] <- TRUE
>
> Hope this is helpful,
>
> Dan
>
> --
> Daniel Nordlund
> Port Townsend, WA  USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.