Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

Jeff Newmiller Sun, 03 Jul 2016 12:11:00 -0700

Typo on the second line

result <- (   result0 
          %>% select( -admin_period1 )
          %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
                       , by = c( ID="ID", admin_period ="admin_period1" )
                        )
          %>% mutate( ddays = end - start )
          )
-- 
Sent from my phone. Please excuse my brevity.


On July 3, 2016 11:55:14 AM PDT, Kevin Wamae <kwa...@kemri-wellcome.org> wrote:
>Hi Jeff, “likes its Excel”, I don’t follow. Pardon me for any mix up.
>
>Thanks for the code.  After running it, this is the error I get.
>
>Error: cannot join on columns 'admin_period' x 'admin_period1': index
>out of bounds
>
>Regards
>-------------------------------------------------------------------------------
>Kevin Wame | Ph.D. Student (IDeAL)
>KEMRI-Wellcome Trust Collaborative Research Programme
>Centre for Geographic Medicine Research
>P.O. Box 230-80108, Kilifi, Kenya
> 
>
>On 7/3/16, 9:34 PM, "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> wrote:
>
>I still get the impression from your mixing of information types that
>you are thinking like this is Excel.
>
>Perhaps something like
>
>drug_study$admin_period  <- ave( "Y" == drug_study$drug_admin,
>drug_study$ID, FUN=cumsum )
>library(dplyr)
>result0 <- (   drug_study
>          %>% filter( 0 != admin_period )
>          %>% group_by( ID, admin_period )
>          %>% summarise( start = min( date ) )
>          %>% mutate( admin_period1 = admin_period -1 )
>          )
>result <- (   result0 
>          %>% select( -admin_period )
>     %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
>                     , by = c( ID="ID", admin_period ="admin_period1" )
>                        )
>          %>% mutate( ddays = end - start )
>          )
>-- 
>Sent from my phone. Please excuse my brevity.
>
>On July 3, 2016 10:24:51 AM PDT, Kevin Wamae
><kwa...@kemri-wellcome.org> wrote:
>>HI Jeff, it’s been an uphill task working with the dataset and I am
>not
>>the first to complain. Nonetheless, data-cleaning is ongoing and since
>>I cannot wait for that to get done, I decided to make the most of what
>>the dataset looks like at this time. It appears the process may take a
>>while.
>>
>>Thanks for the script. From the output, I noticed that “result”
>>contains the first and last date for each of the individuals and not
>>taking into account the variable “drug-admin”. 
>>
>>ID        start               end
>>J1/3      1/5/09      12/25/10
>>R1/3      1/4/07      12/15/08
>>R10/1     1/4/07      3/5/12
>>
>>My aim is to pick the date, for example in 2007, where drug-admin ==
>>“Y” as my start and the date in the subsequent year (2008 in this
>case)
>>where drug-admin == “Y” as my end. Then, I should populate the
>variable
>>“study_id” with “start” up to the entry just above the one whose date
>>matches “end”, as the output below shows (I hope its structure is
>>maintained as I have copied it from R-Studio). The goal for now is to
>>then get difference in days between “date” and “study_id” and still
>get
>>to keep that column for “study_id” as I might use it later.
>>
>>From the output, it can be seen that for this individual, the dates
>run
>>from 2007 to 2008. However, for some individuals, the dates run from
>>2008-2009, 2009-2010 and so on. Therefore, I need to make the script
>>deal with all the years as the dates range from 2001-2016
>>
>>ID    date    drug_admin      year    month   study_id
>>R1/3  5/11/07 Y       2007    5       5/11/07
>>R1/3  5/16/07         2007    5       5/11/07
>>R1/3  5/22/07         2007    5       5/11/07
>>R1/3  5/28/07         2007    5       5/11/07
>>R1/3  6/5/07                  2007    6       5/11/07
>>R1/3  6/11/07         2007    6       5/11/07
>>R1/3  6/18/07         2007    6       5/11/07
>>R1/3  6/25/07         2007    6       5/11/07
>>R1/3  7/2/07                  2007    7       5/11/07
>>R1/3  7/16/07         2007    7       5/11/07
>>R1/3  7/29/07         2007    7       5/11/07
>>R1/3  8/2/07                  2007    8       5/11/07
>>R1/3  8/7/07                  2007    8       5/11/07
>>R1/3  8/13/07         2007    8       5/11/07
>>R1/3  9/18/07         2007    9       5/11/07
>>R1/3  9/24/07         2007    9       5/11/07
>>R1/3  10/6/07         2007    10      5/11/07
>>R1/3  10/8/07         2007    10      5/11/07
>>R1/3  10/15/07                2007    10      5/11/07
>>R1/3  10/22/07                2007    10      5/11/07
>>R1/3  10/29/07                2007    10      5/11/07
>>R1/3  11/8/07         2007    11      5/11/07
>>R1/3  11/12/07                2007    11      5/11/07
>>R1/3  11/19/07                2007    11      5/11/07
>>R1/3  11/29/07                2007    11      5/11/07
>>R1/3  12/6/07         2007    12      5/11/07
>>R1/3  12/10/07                2007    12      5/11/07
>>R1/3  12/21/07                2007    12      5/11/07
>>R1/3  1/7/08                  2008    1       5/11/07
>>R1/3  1/14/08         2008    1       5/11/07
>>R1/3  1/21/08         2008    1       5/11/07
>>R1/3  1/28/08         2008    1       5/11/07
>>R1/3  2/4/08          Y       2008    2       
>>
>>
>>Regards
>>-------------------------------------------------------------------------------
>>Kevin Wame 
>>
>>###############################################################
>>
>>###############################################################
>>
>>
>>
>>On 7/3/16, 7:05 PM, "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> wrote:
>>
>>result <- setNames( data.frame( aggregate( date~ID, data=drug_study,
>>FUN=min ),  aggregate( date~ID, data=drug_study, FUN=max )[2] ), c(
>>"ID", "start", "end" ) )
>>
>>
>>______________________________________________________________________
>>
>>This e-mail contains information which is confidential. It is intended
>>only for the use of the named recipient. If you have received this
>>e-mail in error, please let us know by replying to the sender, and
>>immediately delete it from your system.  Please note, that in these
>>circumstances, the use, disclosure, distribution or copying of this
>>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>>cannot accept any responsibility for the  accuracy or completeness of
>>this message as it has been transmitted over a public network.
>Although
>>the Programme has taken reasonable precautions to ensure no viruses
>are
>>present in emails, it cannot accept responsibility for any loss or
>>damage arising from the use of the email or attachments. Any views
>>expressed in this message are those of the individual sender, except
>>where the sender specifically states them to be the views of
>>KEMRI-Wellcome Trust Programme.
>>______________________________________________________________________
>
>
>
>
>______________________________________________________________________
>
>This e-mail contains information which is confidential. It is intended
>only for the use of the named recipient. If you have received this
>e-mail in error, please let us know by replying to the sender, and
>immediately delete it from your system.  Please note, that in these
>circumstances, the use, disclosure, distribution or copying of this
>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>cannot accept any responsibility for the  accuracy or completeness of
>this message as it has been transmitted over a public network. Although
>the Programme has taken reasonable precautions to ensure no viruses are
>present in emails, it cannot accept responsibility for any loss or
>damage arising from the use of the email or attachments. Any views
>expressed in this message are those of the individual sender, except
>where the sender specifically states them to be the views of
>KEMRI-Wellcome Trust Programme.
>______________________________________________________________________

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

Reply via email to