Here is another stab at it:
library(dplyr) # first approach is broken apart to show the progression of the innards resultStep1 <- ( teste %>% group_by( ID ) %>% mutate( Group = as.character( Group ) , transitionT2 = diff( c( FALSE, "T2"== Group ) ) , transitionT3 = diff( c( FALSE, "T3"== Group ) ) , groupseqT2 = cumsum( abs( transitionT2 ) ) , groupseqT3 = cumsum( abs( transitionT3 ) ) , isT2 = 1 == groupseqT2 , isT3 = 1 == groupseqT3 ) %>% as.data.frame ) resultStep1 # notice how the groupseq columns number the groups of consecutive similar # values, and you are only interested in the groups numbered 1. # more compactly result <- ( teste %>% group_by( ID ) %>% mutate( Group = as.character( Group ) , keep = ( 1 == cumsum( abs( diff( c( FALSE, "T2"== Group ) ) ) ) | 1 == cumsum( abs( diff( c( FALSE, "T3"== Group ) ) ) ) ) ) %>% filter( keep ) %>% select( -keep ) %>% as.data.frame ) #####
resultStep1
ID Group Var transitionT2 transitionT3 groupseqT2 groupseqT3 isT2 isT3 1 3 T2 0.32 1 0 1 0 TRUE FALSE 2 4 T3 1.59 0 1 0 1 FALSE TRUE 3 1 T2 2.94 1 0 1 0 TRUE FALSE 4 1 T2 3.23 0 0 1 0 TRUE FALSE 5 1 T2 1.40 0 0 1 0 TRUE FALSE 6 1 T2 1.62 0 0 1 0 TRUE FALSE 7 1 T2 2.43 0 0 1 0 TRUE FALSE 8 1 T2 2.53 0 0 1 0 TRUE FALSE 9 1 T2 2.25 0 0 1 0 TRUE FALSE 10 1 T3 1.66 -1 1 2 1 FALSE TRUE 11 1 T3 2.86 0 0 2 1 FALSE TRUE 12 1 T3 0.53 0 0 2 1 FALSE TRUE 13 1 T3 1.66 0 0 2 1 FALSE TRUE 14 1 T3 3.24 0 0 2 1 FALSE TRUE 15 1 T3 1.34 0 0 2 1 FALSE TRUE 16 1 T2 1.86 1 -1 3 2 FALSE FALSE 17 1 T2 3.03 0 0 3 2 FALSE FALSE 18 1 T3 3.63 -1 1 4 3 FALSE FALSE 19 1 T3 2.78 0 0 4 3 FALSE FALSE 20 1 T3 1.49 0 0 4 3 FALSE FALSE 21 2 T2 2.00 1 0 1 0 TRUE FALSE 22 2 T2 2.39 0 0 1 0 TRUE FALSE 23 2 T2 1.65 0 0 1 0 TRUE FALSE 24 2 T2 2.05 0 0 1 0 TRUE FALSE 25 2 T2 2.75 0 0 1 0 TRUE FALSE 26 2 T2 2.23 0 0 1 0 TRUE FALSE 27 2 T2 1.39 0 0 1 0 TRUE FALSE 28 2 T2 2.66 0 0 1 0 TRUE FALSE 29 2 T2 1.05 0 0 1 0 TRUE FALSE 30 2 T3 2.52 -1 1 2 1 FALSE TRUE 31 2 T2 2.49 1 -1 3 2 FALSE FALSE 32 2 T2 2.97 0 0 3 2 FALSE FALSE 33 2 T2 0.43 0 0 3 2 FALSE FALSE 34 2 T2 1.36 0 0 3 2 FALSE FALSE 35 2 T3 0.79 -1 1 4 3 FALSE FALSE 36 2 T3 1.71 0 0 4 3 FALSE FALSE 37 2 T3 1.95 0 0 4 3 FALSE FALSE 38 2 T2 2.73 1 -1 5 4 FALSE FALSE 39 2 T2 2.73 0 0 5 4 FALSE FALSE 40 2 T2 2.39 0 0 5 4 FALSE FALSE 41 2 T2 2.17 0 0 5 4 FALSE FALSE 42 2 T2 2.34 0 0 5 4 FALSE FALSE 43 2 T3 2.42 -1 1 6 5 FALSE FALSE 44 2 T3 1.75 0 0 6 5 FALSE FALSE 45 2 T3 0.66 0 0 6 5 FALSE FALSE 46 2 T3 1.64 0 0 6 5 FALSE FALSE 47 2 T2 0.24 1 -1 7 6 FALSE FALSE 48 2 T3 2.11 -1 1 8 7 FALSE FALSE 49 2 T3 2.11 0 0 8 7 FALSE FALSE 50 2 T3 1.18 0 0 8 7 FALSE FALSE On Sun, 11 Oct 2015, peter dalgaard wrote:
These situations where the desired results depend on the order of observations in a dataset do tend to get a little tricky (this is one kind of problem that is easier to handle in a SAS DATA step with its sequential processing paradigm). I think this will do it: keep <- function(d) with(d, { n <- length(Group) i <- c(TRUE,Group[-n] != Group[-1]) unsplit(lapply(split(i,Group), cumsum), Group) == 1 }) kp <- unsplit(lapply(split(teste, teste$ID), keep), teste$ID) teste[kp,] I.e. keep() is a function applied to each ID-subset of the data frame, returning a logical vector of the observations that you want to keep. i is an indicator that an observation is the first in a sequence. Splitting by group and cumsum'ing gives 1 for the first sequence, 2 for the next, etc. The observations to keep are the ones for which this value is 1. -pdOn 10 Oct 2015, at 22:27 , Cacique Samurai <caciquesamu...@gmail.com> wrote: Hello Jeff! Thanks very much for your prompt reply, but this is not exactly what I need. I need the first sequence of records. In example that I send, I need the first seven lines of group "T2" in ID "1" (lines 3 to 9) and others six lines of group "T3" in ID "1" (lines 10 to 15). I have to discard lines 16 to 20, that represent repeated sequential records of those groups in same ID. Others ID (I sent just a small piece of my data) I have much more sequential lines of records of each group in each ID, and many sequential records that should be discarded. I some cases, I have just one record of a group in an ID. As I told, I tried to use a labeling variable, that mark first seven lines 3 to 9 as 1 (first sequence of T2 in ID 1), lines 10 to 15 as 1 (first sequence of T3 in ID 1), lines 16 and 17 as 2 (second sequence of T2 in ID 1) and lines 18 to 20 as 2 (second sequence of T3 in ID 1), and so on... Then will be easy take just the first record by each ID. But the code that I made was a long long loop sequence that at end did not work as I want to. Once more, thanks in advanced for your atention and help, Raoni 2015-10-10 13:13 GMT-03:00 Jeff Newmiller <jdnew...@dcn.davis.ca.us>:?aggregate in base R. Make a short function that returns the first element of a vector and give that to aggregate. Or... library(dplyr) ( test %>% group_by( ID, Group ) %>% summarise( Var=first( Var ) ) %>% as.data.frame ) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On October 10, 2015 8:38:00 AM PDT, Cacique Samurai <caciquesamu...@gmail.com> wrote:Hello R-Helpers! I have a data-frame as below (dput in the end of mail) and need to select just the first sequence of occurrence of each "Group" in each "ID". For example, for ID "1" I have two sequential occurrences of T2 and two sequential occurrences of T3:test [test$ID == 1, ]ID Group Var 3 1 T2 2.94 4 1 T2 3.23 5 1 T2 1.40 6 1 T2 1.62 7 1 T2 2.43 8 1 T2 2.53 9 1 T2 2.25 10 1 T3 1.66 11 1 T3 2.86 12 1 T3 0.53 13 1 T3 1.66 14 1 T3 3.24 15 1 T3 1.34 16 1 T2 1.86 17 1 T2 3.03 18 1 T3 3.63 19 1 T3 2.78 20 1 T3 1.49 As output, I need just the first group of T2 and T3 for this ID, like: ID Group Var 3 1 T2 2.94 4 1 T2 3.23 5 1 T2 1.40 6 1 T2 1.62 7 1 T2 2.43 8 1 T2 2.53 9 1 T2 2.25 10 1 T3 1.66 11 1 T3 2.86 12 1 T3 0.53 13 1 T3 1.66 14 1 T3 3.24 15 1 T3 1.34 For others ID I have just one occurrence or sequence of occurrence of each Group. I tried to use a labeling variable, but cannot figure out do this without many many loops.. Thanks in advanced, Raoni dput (teste) structure(list(ID = structure(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2", "3", "4"), class = "factor"), Group = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("T2", "T3"), class = "factor"), Var = c(0.32, 1.59, 2.94, 3.23, 1.4, 1.62, 2.43, 2.53, 2.25, 1.66, 2.86, 0.53, 1.66, 3.24, 1.34, 1.86, 3.03, 3.63, 2.78, 1.49, 2, 2.39, 1.65, 2.05, 2.75, 2.23, 1.39, 2.66, 1.05, 2.52, 2.49, 2.97, 0.43, 1.36, 0.79, 1.71, 1.95, 2.73, 2.73, 2.39, 2.17, 2.34, 2.42, 1.75, 0.66, 1.64, 0.24, 2.11, 2.11, 1.18)), .Names = c("ID", "Group", "Var"), row.names = c(NA, 50L ), class = "data.frame") ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.-- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.ra...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com
--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.