Re: [R] Creating a conditional lag variable in R

Faradj Koliev Sat, 27 Jul 2019 06:00:47 -0700

Thank you all. I now have the right solution for this (perhaps of interest to 
some):


check_pre <- function(idx, k) { pre_vec <- sapply(1:length(idx), function(x) 
+any(idx[x:(pmin(x + k, length(idx)))] %in% 1)); pre_vec[idx == 1] <- 0; 
return(pre_vec) }

df %>%
  group_by(country) %>%
  mutate(
    idx = +( (lag(X1) == 0 & X1 == 1) | row_number() == 1 & X1 == 1),
    X1_pre4 = check_pre(idx, 4),
    X1_pre5 = check_pre(idx, 5),
    idx = NULL
  )


> On 27 Jul 2019, at 10:45, Faradj Koliev <farad...@gmail.com> wrote:
> 
> Peter Dalgaard, 
> 
> Thanks for this. 
> 
> I’ll try to think of ways to apply this logic. At the moment, I’m trying to 
> do this with “mutate” using dplyr package. But it’s not easy..
> 
>> On 27 Jul 2019, at 10:33, peter dalgaard <pda...@gmail.com> wrote:
>> 
>> Some pointers (not tested, may contain blunders...)
>> 
>> (a) you likely need some sort of split-operate-unsplit construct, by 
>> country. E.g.,
>> 
>> myfun <- function(d) {....operate on data frame with only one country....} 
>> ll <- split(data, data$country)
>> ll.new <- lapply(ll, myfun)
>> data.new <- unsplit(ll.new, data$country)
>> 
>> (There might be a tidyverse idiom for this too)
>> 
>> (b) your X1_pre5count looks like it is the same as cumsum(1-X1)*X1 (within 
>> country)
>> 
>> (c) if you count in the opposite direction, tt <- rev(cumsum(rev(1-X1))) you 
>> get number of years until agreement. Then X1_pre4 should be as.integer(tt 
>> <=4  & tt > 0)
>> 
>> -pd
>> 
>>> On 27 Jul 2019, at 09:13 , Faradj Koliev <farad...@gmail.com> wrote:
>>> 
>>> Re-post, now in *plain text*. 
>>> 
>>> 
>>> 
>>> Dear R-users, 
>>> 
>>> I’ve a rather complicated task to do and need all the help I can get. 
>>> 
>>> I have data indicating whether a country has signed an agreement or not 
>>> (1=yes and 0=otherwise). I want to simply create variable that would 
>>> capture the years before the agreement is signed. The aim is to see whether 
>>> pre or post agreement period has any impact on my dependent variables. 
>>> 
>>> More preciesly, I want to create the following variables: 
>>> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 
>>> otherwise; 
>>> (ii) a variable that is =1 5 years pre the agreement and 
>>> (iii) a variable that would count the 4 and 5 years pre the agreement 
>>> (1,2,3,4..). 
>>> 
>>> Please see the sample data below. I have manually added the variables I 
>>> would like to generate in R, labelled as “X1_pre4” ( 4 years before the 
>>> agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), 
>>> and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 
>>> and X2 is the agreement that countries have either signed (1) or not (0). 
>>> Note though that I want the variable to capture all the years up to 4 and 
>>> 5. If it’s only 2 years, it should still be ==1 (please see the example 
>>> below). 
>>> 
>>> To illustrate the logic: the country A has signed the agreement X1 in 1972 
>>> in the sample data,  then, the (i) and (ii) variables as above should be =1 
>>> for the years 1970, 1971, and =0 from 1972 until the end of the study 
>>> period. 
>>> 
>>> The country A has signed the agreement X2 in 1975,  then, the (i) variable 
>>> should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for the 
>>>  1970-1974  period (post 5 years before the agreement is signed). 
>>> 
>>> Later, I would also like to create post_4 and post_5 variables, but I think 
>>> I’ll be able to figure it out once I know how to generate the pre/before 
>>> variables. 
>>> 
>>> All suggestions are much appreciated! 
>>> 
>>> 
>>> 
>>> data<-structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
>>> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
>>> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
>>>  year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L, 
>>>  1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L, 
>>>  1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 
>>>  1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 
>>>  1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 
>>>  1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 
>>>  1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L), 
>>>  X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>  1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>  1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
>>>  1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L, 
>>>  1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
>>>  1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L, 
>>>  1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
>>>  X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 
>>>  4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L, 
>>>  0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
>>> -60L))
>>> 
>>>> On 26 Jul 2019, at 21:58, Bert Gunter <bgunter.4...@gmail.com> wrote:
>>>> 
>>>> Because you posted in HTML, your example got mangled and resulted in an 
>>>> error. Re-post in *plain text* please (making sure that you cut and paste 
>>>> correctly)
>>>> 
>>>> Bert Gunter
>>>> 
>>>> "The trouble with having an open mind is that people keep coming along and 
>>>> sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>> 
>>>> 
>>>> On Fri, Jul 26, 2019 at 12:25 PM Faradj Koliev <farad...@gmail.com> wrote:
>>>> Dear R-users, 
>>>> 
>>>> I’ve a rather complicated task to do and need all the help I can get. 
>>>> 
>>>> I have data indicating whether a country has signed an agreement or not 
>>>> (1=yes and 0=otherwise). I want to simply create variable that would 
>>>> capture the years before the agreement is signed. The aim is to see 
>>>> whether pre or post agreement period has any impact on my dependent 
>>>> variables. 
>>>> 
>>>> More preciesly, I want to create the following variables: 
>>>> (i) a variable that is =1 in the 4 years pre/before the agreement, 0 
>>>> otherwise; 
>>>> (ii) a variable that is =1 5 years pre the agreement and 
>>>> (iii) a variable that would count the 4 and 5 years pre the agreement 
>>>> (1,2,3,4..). 
>>>> 
>>>> Please see the sample data below. I have manually added the variables I 
>>>> would like to generate in R, labelled as “X1_pre4” ( 4 years before the 
>>>> agreement X1), “X2_pre4”, “X1_pret5” ( 5 years before the agreement X5), 
>>>> and “X1pre5_count” (which basically count the years, 1,2,3, etc). The X1 
>>>> and X2 is the agreement that countries have either signed (1) or not (0). 
>>>> Note though that I want the variable to capture all the years up to 4 and 
>>>> 5. If it’s only 2 years, it should still be ==1 (please see the example 
>>>> below). 
>>>> 
>>>> To illustrate the logic: the country A has signed the agreement X1 in 1972 
>>>> in the sample data,  then, the (i) and (ii) variables as above should be 
>>>> =1 for the years 1970, 1971, and =0 from 1972 until the end of the study 
>>>> period. 
>>>> 
>>>> The country A has signed the agreement X2 in 1975,  then, the (i) variable 
>>>> should be =1 from 1971 to 1974 (post 4 years) and (ii) should be =1 for 
>>>> the  1970-1974  period (post 5 years before the agreement is signed). 
>>>> 
>>>> Later, I would also like to create post_4 and post_5 variables, but I 
>>>> think I’ll be able to figure it out once I know how to generate the 
>>>> pre/before variables. 
>>>> 
>>>> All suggestions are much appreciated! 
>>>> 
>>>> 
>>>> 
>>>> data<–structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
>>>> 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
>>>> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
>>>>  year = c(1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L, 
>>>>  1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L, 
>>>>  1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 
>>>>  1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 
>>>>  1985L, 1986L, 1987L, 1988L, 1970L, 1971L, 1972L, 1973L, 1974L, 
>>>>  1975L, 1976L, 1977L, 1978L, 1979L, 1980L, 1981L, 1982L, 1983L, 
>>>>  1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L), 
>>>>  X1 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>>  1L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
>>>>  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
>>>>  1L, 1L, 1L, 1L), X1_pre4 = c(1L, 1L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2_pre4 = c(0L, 1L, 1L, 
>>>>  1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
>>>>  1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1_pre5 = c(1L, 
>>>>  1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
>>>>  X1_pre5_count = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 
>>>>  4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 4L, 5L, 0L, 0L, 0L, 
>>>>  0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
>>>> -60L))
>>>> 
>>>> 
>>>> 
>>>>      [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> -- 
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd....@cbs.dk  Priv: pda...@gmail.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating a conditional lag variable in R

Reply via email to