Re: [R] Numbering sequences of non-NAs in a vector

Marc Schwartz Tue, 07 Jul 2009 14:57:16 -0700


On Jul 7, 2009, at 4:08 PM, Krishna Tateneni wrote:

Greetings, I have a vector of the form:
[10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, acombinationof sequences of non-missing values and missing values, with eachsequence
possibly of a different length.
I'd like to create another vector which will help me pick out thesequences
of non-missing values.  For the example above, this would be:
[1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goalultimately is
to calculate means separately for each sequence.

Your help is appreciated.  If I'm making this more complicated than
necessary, I'd appreciate knowing that as well!

Many thanks.


Here is one possibility:

Vec <- c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9)

> Vec
 [1] 10  8  1  3  0  8 NA NA NA NA  2  1  6 NA NA NA  0  5  1  9


Use rle() to get the runs of NA and non-NA values. See ?rle

Runs <- rle(is.na(Vec))

> Runs
Run Length Encoding
  lengths: int [1:5] 6 4 3 3 4
  values : logi [1:5] FALSE TRUE FALSE TRUE FALSE


Create grouping values for each run:

Grps <- rep(seq(length(Runs$lengths)), Runs$lengths)

> Grps
 [1] 1 1 1 1 1 1 2 2 2 2 3 3 3 4 4 4 5 5 5 5


Now get the means for each run, split by Grps. See ?aggregate

> aggregate(Vec, list(Grps = Grps), mean)
  Grps    x
1    1 5.00
2    2   NA
3    3 3.00
4    4   NA
5    5 3.75

If you don't want the NA runs included in the result, you could usesubset():


> subset(aggregate(Vec, list(Grps = Grps), mean), !is.na(x))
  Grps    x
1    1 5.00
3    3 3.00
5    5 3.75


HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Numbering sequences of non-NAs in a vector

Reply via email to