Re: [R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

David Winsemius via R-help Wed, 27 Nov 2024 10:25:11 -0800


On 11/27/24 08:30, Sorkin, John wrote:
> I am an old, long time SAS programmer. I need to produce R code that 
> processes a dataframe in a manner that is equivalent to that produced by 
> using a by statement in SAS and an if first.day statement and a retain 
> statement:
>
> I want to take data (olddata) that looks like this
> ID    Day
> 1     1
> 1     1
> 1     2
> 1     2
> 1     3
> 1     3
> 1     4
> 1     4
> 1     5
> 1     5
> 2     5
> 2     5
> 2     5
> 2     6
> 2     6
> 2     6
> 3     10
> 3     10
>
> and make it look like this:
> (withing each ID I am copying the first value of Day into a new variable, 
> FirstDay, and propagating the FirstDay value through all rows that have the 
> same ID:
>
> ID    Day     FirstDay
> 1     1       1
> 1     1       1
> 1     2       1
> 1     2       1
> 1     3       1
> 1     3       1
> 1     4       1
> 1     4       1
> 1     5       1
> 1     5       1
> 2     5       5
> 2     5       5
> 2     5       5
> 2     6       5
> 2     6       5
> 2     6       5
> 3     10      3
> 3     10      3
>
> SAS code that can do this is:
>
> proc sort data=olddata;
>    by ID Day;
> run;
>
> data newdata;
>    retain FirstDay;
>    set olddata;
>    by ID;
>    if first.ID then FirstDay=Day;
> run;
>
> I have NO idea how to do this is R (so I can't post test-code), but below I 
> have R code that creates olddata:
>
> ID <- c(rep(1,10),rep(2,6),rep(3,2))
> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
>            rep(5,3),rep(6,3),rep(10,2))
> date
> olddata <- data.frame(ID=ID,date=date)
> olddata
>
> Any suggestions on how to do this would be appreciated. . . I have worked on 
> this for more than 12-hours, despite multiple we searches I have gotten 
> nowhere. . .


There's an R base function named, wait for it, ... `by`

It returns a list  that is the results of a function applied to the 
sub-dataframes indexed by whatever grouping variable you specify in the 
second argument. My memory told me that it needed to be presented as a 
list which was why I chose to use the `[` function rather than `$` or `[[`

by(olddata, olddata["ID"], FUN= function(x) { rep( x$ID[1], 
times=nrow(x) )}) #------------------- ID: 1 [1] 1 1 1 1 1 1 1 1 1 1 
------------------------------------------------------------------------------------
 
ID: 2 [1] 2 2 2 2 2 2 
------------------------------------------------------------------------------------
 
ID: 3 [1] 3 3 So all you need to do from there is unlist it and assign 
to the new named column #------------------ olddata$FirstDay <- unlist( 
by(olddata, olddata["ID"], FUN= function(x) { rep( x$ID[1], 
times=nrow(x) )}) ) olddata #---------------------------- ID date 
FirstDay 1 1 1 1 2 1 1 1 3 1 2 1 4 1 2 1 5 1 3 1 6 1 3 1 7 1 4 1 8 1 4 1 
9 1 5 1 10 1 5 1 11 2 5 2 12 2 5 2 13 2 5 2 14 2 6 2 15 2 6 2 16 2 6 2 
17 3 10 3 18 3 10 3

HTH

David.

>   
>
> Thanks
> John
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
> Center Geriatrics Research, Education, and Clinical Center;
> PI Biostatistics and Informatics Core, University of Maryland School of 
> Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
>
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
>
>
>
> ______________________________________________
> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttps://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

Reply via email to