Re: [R] alternative for multiple if_else statements

Kevin Wamae Thu, 22 Feb 2018 21:13:22 -0800

Dear Eric, thank you for that observation.

I realised that some of the participants have duplicated “survey_start” dates 
and when I corrected this, the code works.

Regards
------------------
Kevin Wamae
From: Eric Berger <ericjber...@gmail.com>
Date: Thursday, 22 February 2018 at 15:16
To: Kevin Wamae <kwa...@kemri-wellcome.org>
Cc: "R-help@r-project.org" <R-help@r-project.org>
Subject: Re: [R] alternative for multiple if_else statements

Hi Kevin,
I ran the code on the full data set and was able to reproduce the problem that 
you are facing.
My guess is that you have an error in your intuition and/or logic, and that 
this relates to the use of the subscript [1].
Specifically, on the full dataset, the condition
trialData$date[trialData$survey_start == "Y" & trialData$year == 2013 & 
trialData$site == "site_1"]

yields 412 matches, of which there are 9 unique ones, specifically

April 2,3,4,5,8,10,11,16,17

In the full data set the first element that appears, i.e. subscript[1], is 
"2013-04-04".

In the filtered data set the first element that appears is "2013-04-05".

I hope that is enough information for you to make further progress from here.

Best,
Eric

On Thu, Feb 22, 2018 at 1:28 PM, Kevin Wamae 
<kwa...@kemri-wellcome.org<mailto:kwa...@kemri-wellcome.org>> wrote:
Dear Eric, wow, this seems to do the trick. But I have encountered a problem.

I have tested it on the larger dataset and it seems to work on a filtered 
dataset but not on the whole dataset (attached). See below script..

#load packages
Library(dplyr)

#load data
trialData <- fread("trialData.txt") %>% mutate(date = as.Date(date,"%d/%m/%Y"))

#create blank variable
trialData$survey_year <- rep(NA_character_, nrow(trialData))

#attempt 1 fails: code for survey
trialData$survey_year[trialData$date >= trialData$date[trialData$survey_start 
== "Y" & trialData$year == 2013 & trialData$site == "site_1"][1] & 
trialData$date < trialData$date[trialData$month == 4 & trialData$year == 2014 & 
trialData$site == "site_1"][1]] <- "survey_2013"

#filter trialData
trialData <- trialData %>% filter(id == "id_786/3")

#attempt 2 works: code for survey
trialData$survey_year[trialData$date >= trialData$date[trialData$survey_start 
== "Y" & trialData$year == 2013 & trialData$site == "site_1"][1] & 
trialData$date < trialData$date[trialData$month == 4 & trialData$year == 2014 & 
trialData$site == "site_1"][1]] <- "survey_2013"

From: Eric Berger <ericjber...@gmail.com<mailto:ericjber...@gmail.com>>
Date: Thursday, 22 February 2018 at 13:05
To: Kevin Wamae <kwa...@kemri-wellcome.org<mailto:kwa...@kemri-wellcome.org>>
Cc: "R-help@r-project.org<mailto:R-help@r-project.org>" 
<R-help@r-project.org<mailto:R-help@r-project.org>>
Subject: Re: [R] alternative for multiple if_else statements

Hi,
1. I think the reason that the different ordering leads to different results is 
because of the following:
    date[ some condition is true ][1]
    will give you an NA if there are no rows where 'some condition holds'.
    In the code that 'works' you don't have such a situation, but in the code 
that 'does not work' you presumably hit an NA before you get to the result that 
you really want.
2. I am not a big fan of your "nested if" layout. I think you could rewrite it 
more clearly - and without nesting - with something like

     > trialData$survey_year <- rep(NA_character_, nrow(trialData))
     > trialData$survey_year[ condition for survey_2007 ] <- "survey_2007"
     > trialData$survey_year[ condition for survey_2008 ] <- "survey_2008"
     > etc

HTH,
Eric

On Wed, Feb 21, 2018 at 10:33 PM, Kevin Wamae 
<kwa...@kemri-wellcome.org<mailto:kwa...@kemri-wellcome.org>> wrote:
Hi, I am having trouble trying to figure out why if_else is behaving the way it 
is, it may be my code or the way the data is structured.

Below is a snapshot of a database am working on and it represents a 
longitudinal survey of study participants in a trial with weekly follow up.

The variable "survey_start" represents the start of the study-defined one year 
follow up (which we called "survey_year").

I am trying to populate all subsequent entries for each participant, per survey 
year, with the entry "survey" followed by an underscore and the respective 
year, eg. survey_2014.

There are missing entries such as the participant represented here, wasn't 
available at the start of the 2015 survey. Also, some participants don’t have 
complete one-year follow ups but I still need to include them.

I have written two codes, first one fails while the second works, the only 
difference being I have reversed the order in which the entries are populated 
in the second code (from 2007-2016 to 2016-2007) and removed the if_else 
statement for 2015. Also noticed, that for the second code, which spans the 
years 2007-2016 (less 2015), if a participants entries start from 2010-2016, 
the code fails.

Kindly assist in figuring this out...or better yet, an alternative.

    trialData <- structure(list(study = c("site_1", "site_1", "site_1", 
"site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1", "site_1", "site_1", "site_1", "site_1", "site_1",
"site_1", "site_1"), studyno = c("child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1", "child_1", "child_1", "child_1", "child_1",
"child_1", "child_1"), date = structure(c(16078, 16085, 16092,
16098, 16104, 16115, 16121, 16129, 16135, 16140, 16146, 16156,
16162, 16168, 16177, 16185, 16191, 16195, 16203, 16210, 16217,
16225, 16234, 16237, 16246, 16253, 16262, 16269, 16278, 16283,
16288, 16297, 16304, 16311, 16319, 16326, 16332, 16337, 16346,
16353, 16360, 16366, 16370, 16381, 16384, 16395, 16399, 16407,
16415, 16422, 16444, 16452, 16454, 16467, 16474, 16477, 16484,
16490, 16501, 16508, 16514, 16520, 16529, 16533, 16539, 16550,
16556, 16564, 16566, 16578, 16582, 16593, 16599, 16604, 16613,
16620, 16623, 16635, 16636, 16654, 16660, 16666, 16673, 16681,
16688, 16693, 16702, 16706, 16714, 16721, 16728, 16734, 16745,
16749, 16757, 16764, 16769, 16778, 16785, 16792, 16805, 16812,
16819, 16830, 16832, 16839, 16846, 16856, 16862, 16867, 16877,
16884, 16890, 16898, 16904, 16912, 16917, 16923, 16936, 16938,
16953, 16960, 16966, 16973, 16980), class = "Date"), year = c(2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L), month = c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L,
8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
12L, 12L, 12L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L,
11L, 11L, 11L, 11L, 12L, 12L, 12L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 6L), survey_start = c("", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "Y", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "Y", "", "", "", "", "", "", "", "",
"", "", "", "", "", "")), class = "data.frame", row.names = c(NA,
-125L), .Names = c("study", "studyno", "date", "year", "month",
"survey_start"))

code 1 fails:

trialData <- trialData %>% arrange(studyno, date) %>% group_by(studyno) %>%
mutate(survey_year = if_else(date >= date[survey_start == "Y" & year == 2007 & 
study == "site_1"][1] & date < date[month == 5 & year == 2008 & study == 
"site_1"][1], "survey_2007",
                     if_else(date >= date[survey_start == "Y" & year == 2008 & 
study == "site_1"][1] & date < date[month == 4 & year == 2009 & study == 
"site_1"][1], "survey_2008",
                     if_else(date >= date[survey_start == "Y" & year == 2009 & 
study == "site_1"][1] & date < date[month == 5 & year == 2010 & study == 
"site_1"][1], "survey_2009",
                     if_else(date >= date[survey_start == "Y" & year == 2010 & 
study == "site_1"][1] & date < date[month == 5 & year == 2011 & study == 
"site_1"][1], "survey_2010",
                     if_else(date >= date[survey_start == "Y" & year == 2011 & 
study == "site_1"][1] & date < date[month == 4 & year == 2012 & study == 
"site_1"][1], "survey_2011",
                     if_else(date >= date[survey_start == "Y" & year == 2012 & 
study == "site_1"][1] & date < date[month == 4 & year == 2013 & study == 
"site_1"][1], "survey_2012",
                     if_else(date >= date[survey_start == "Y" & year == 2013 & 
study == "site_1"][1] & date < date[month == 4 & year == 2014 & study == 
"site_1"][1], "survey_2013",
                     if_else(date >= date[survey_start == "Y" & year == 2014 & 
study == "site_1"][1] & date < date[month == 4 & year == 2015 & study == 
"site_1"][1], "survey_2014",
                     if_else(date >= date[survey_start == "Y" & year == 2015 & 
study == "site_1"][1] & date < date[month == 3 & year == 2016 & study == 
"site_1"][1], "survey_2015",
                     if_else(date >= date[survey_start == "Y" & year == 2016 & 
study == "site_1"][1], "survey_2016","")))))))))))

code 2 works:

    trialData <- trialData %>% arrange(studyno, date) %>% group_by(studyno) %>%
  mutate(survey_year = if_else(date >= date[survey_start == "Y" & year == 2016 
& study == "site_1"][1]                                                         
      , "survey_2016",
                           if_else(date >= date[survey_start == "Y" & year == 
2014 & study == "site_1"][1] & date < date[month == 4 & year == 2015 & study == 
"site_1"][1], "survey_2014",
                           if_else(date >= date[survey_start == "Y" & year == 
2013 & study == "site_1"][1] & date < date[month == 4 & year == 2014 & study == 
"site_1"][1], "survey_2013",
                           if_else(date >= date[survey_start == "Y" & year == 
2012 & study == "site_1"][1] & date < date[month == 4 & year == 2013 & study == 
"site_1"][1], "survey_2012",
                           if_else(date >= date[survey_start == "Y" & year == 
2011 & study == "site_1"][1] & date < date[month == 4 & year == 2012 & study == 
"site_1"][1], "survey_2011",
                           if_else(date >= date[survey_start == "Y" & year == 
2010 & study == "site_1"][1] & date < date[month == 5 & year == 2011 & study == 
"site_1"][1], "survey_2010",
                           if_else(date >= date[survey_start == "Y" & year == 
2009 & study == "site_1"][1] & date < date[month == 5 & year == 2010 & study == 
"site_1"][1], "survey_2009",
                           if_else(date >= date[survey_start == "Y" & year == 
2008 & study == "site_1"][1] & date < date[month == 4 & year == 2009 & study == 
"site_1"][1], "survey_2008",
                           if_else(date >= date[survey_start == "Y" & year == 
2007 & study == "site_1"][1] & date < date[month == 5 & year == 2008 & study == 
"site_1"][1], "survey_2007",""))))))))))

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system.  Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system. Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the 
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
______________________________________________________________________

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system.  Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternative for multiple if_else statements

Reply via email to