Many thanks Ista and Bert for your nice solutions!
As Ista commented in a previous mail, the 0.87 value in my example is not fixed, but for each subject it depends on the difference "2007-01-01 - fini". However, both of your solutions take into account this fact. Frank S. ________________________________ De: Bert Gunter <bgunter.4...@gmail.com> Enviat el: dilluns, 26 de setembre de 2016 23:18:52 Per a: Ista Zahn A/c: Frank S.; r-help@r-project.org Tema: Re: [R] Using lapply in R data table ... and just for fun, here's an alternative in which mapply() is used to vectorize switch(); again, whether you like it may be just a matter of taste, although I suspect it might be less efficient than ifelse(), which is already vectorized: DT <- within(DT, exposure <- { mapply(function(x,fac)switch(as.character(fac), a = 1, b = difftime(as.Date("2007-01-01"), x, units="days")/365.25, c = .5 ), x = fini, fac = cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= letters[1:3]) )} ) > DT id fini group exposure 1 2 2005-04-20 A 1.0000000 2 2 2005-04-20 A 1.0000000 3 2 2005-04-20 A 1.0000000 4 5 2006-02-19 B 0.8651608 5 5 2006-06-29 B 0.5092402 6 7 2006-10-08 A 0.5000000 7 7 2006-10-08 A 0.5000000 Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 26, 2016 at 1:27 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > Ista: > > Aha -- now I see the point. My bad. You are right. I was careless. > > However, cut() with ifelse() might simplify the code a bit and/or make > it more readable. To be clear, this is just a matter of taste; e.g. > using your data and a data frame instead of a data table: > >> DT <- within(DT, > exposure <- { > f > <-cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), > labels= letters[1:3]) > ifelse(f == "a", 1, > ifelse( f == "c", .5, > difftime(as.Date("2007-01-01"), fini, > units="days")/365.25)) > } > ) > > >> DT > id fini group exposure f > 1 2 2005-04-20 A 1.0000000 a > 2 2 2005-04-20 A 1.0000000 a > 3 2 2005-04-20 A 1.0000000 a > 4 5 2006-02-19 B 0.8651608 b > 5 5 2006-06-29 B 0.5092402 b > 6 7 2006-10-08 A 0.5000000 c > 7 7 2006-10-08 A 0.5000000 c > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Sep 26, 2016 at 12:07 PM, Ista Zahn <istaz...@gmail.com> wrote: >> On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: >>> I thought that that was a typo from the OP, as it disagrees with his >>> example. But the labels are arbitrary, so in fact cut() will do it >>> whichever way he meant. >> >> I don't see how cut will do it, at least not conveniently. Consider >> this slightly altered example: >> >> library(data.table) >> DT <- data.table( >> id = rep(c(2, 5, 7), c(3, 2, 2)), >> fini = rep(as.Date(c('2005-04-20', >> '2006-02-19', >> '2006-06-29', >> '2006-10-08')), >> c(3, 1, 1, 2)), >> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >> >> DT[, exposure := vector(mode = "numeric", length = .N)] >> DT[fini < as.Date("2006-01-01"), exposure := 1] >> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), >> exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25] >> DT[fini >= as.Date("2006-07-01"), exposure := 0.5] >> >> DT >> >> ## id fini group exposure >> ## 1: 2 2005-04-20 A 1.0000000 >> ## 2: 2 2005-04-20 A 1.0000000 >> ## 3: 2 2005-04-20 A 1.0000000 >> ## 4: 5 2006-02-19 B 0.8651608 >> ## 5: 5 2006-06-29 B 0.5092402 >> ## 6: 7 2006-10-08 A 0.5000000 >> ## 7: 7 2006-10-08 A 0.5000000 >> >> Best, >> Ista >> >>> >>> -- Bert >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <istaz...@gmail.com> wrote: >>>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4...@gmail.com> >>>> wrote: >>>>> This seems like a job for cut() . >>>> >>>> I thought that at first two, but the middle group shouldn't be .87 but >>>> rather >>>> >>>> exposure" = "2007-01-01" - "fini" >>>> >>>> so, I think cut alone won't do it. >>>> >>>> Best, >>>> Ista >>>>> >>>>> (I made DT a data frame to avoid loading the data table package. But I >>>>> assume it would work with a data table too, Check this, though!) >>>>> >>>>>> DT <- within(DT, exposure <- >>>>>> cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), >>>>>> labels= c(1,.87,.5))) >>>>> >>>>>> DT >>>>> id fini group exposure >>>>> 1 2 2005-04-20 A 1 >>>>> 2 2 2005-04-20 A 1 >>>>> 3 2 2005-04-20 A 1 >>>>> 4 5 2006-02-19 B 0.87 >>>>> 5 5 2006-02-19 B 0.87 >>>>> 6 7 2006-10-08 A 0.5 >>>>> 7 7 2006-10-08 A 0.5 >>>>> >>>>> >>>>> (but note that exposure is a factor, not numeric) >>>>> >>>>> >>>>> Cheers, >>>>> Bert >>>>> >>>>> Bert Gunter >>>>> >>>>> "The trouble with having an open mind is that people keep coming along >>>>> and sticking things into it." >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>> >>>>> >>>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istaz...@gmail.com> wrote: >>>>>> Hi Frank, >>>>>> >>>>>> lapply(DT) iterates over each column. That doesn't seem to be what you >>>>>> want. >>>>>> >>>>>> There are probably better ways, but here is one approach. >>>>>> >>>>>> DT[, exposure := vector(mode = "numeric", length = .N)] >>>>>> DT[fini < as.Date("2006-01-01"), exposure := 1] >>>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"), >>>>>> exposure := difftime(as.Date("2007-01-01"), fini, >>>>>> units="days")/365.25] >>>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5] >>>>>> >>>>>> Best, >>>>>> Ista >>>>>> >>>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_...@hotmail.com> wrote: >>>>>>> Dear all, >>>>>>> >>>>>>> I have a R data table like this: >>>>>>> >>>>>>> DT <- data.table( >>>>>>> id = rep(c(2, 5, 7), c(3, 2, 2)), >>>>>>> fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, >>>>>>> 2, 2)), >>>>>>> group = rep(c("A", "B", "A"), c(3, 2, 2)) ) >>>>>>> >>>>>>> >>>>>>> I want to construct a new variable "exposure" defined as follows: >>>>>>> >>>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1 >>>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" >>>>>>> - "fini" >>>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5 >>>>>>> >>>>>>> >>>>>>> So the desired output would be the following data table: >>>>>>> >>>>>>> id fini exposure group >>>>>>> 1: 2 2005-04-20 1.00 A >>>>>>> 2: 2 2005-04-20 1.00 A >>>>>>> 3: 2 2005-04-20 1.00 A >>>>>>> 4: 5 2006-02-19 0.87 B >>>>>>> 5: 5 2006-02-19 0.87 B >>>>>>> 6: 7 2006-10-08 0.50 A >>>>>>> 7: 7 2006-10-08 0.50 A >>>>>>> >>>>>>> >>>>>>> I have tried: >>>>>>> >>>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)] >>>>>>> DT.new <- lapply(DT, function(exposure){ >>>>>>> exposure[fini < as.Date("2006-01-01")] <- 1 # 1st case >>>>>>> exposure[fini >= as.Date("2006-01-01") & fini <= >>>>>>> as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, >>>>>>> units="days")/365.25 # 2nd case >>>>>>> exposure[fini >= as.Date("2006-07-01") & fini <= >>>>>>> as.Date("2006-12-31")] <- 0.5 # 3rd case >>>>>>> exposure # return value >>>>>>> }) >>>>>>> >>>>>>> >>>>>>> But I get an error message. >>>>>>> [[elided Hotmail spam]] >>>>>>> >>>>>>> >>>>>>> Frank S. >>>>>>> >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.