On Jul 7, 2009, at 4:08 PM, Krishna Tateneni wrote:

Greetings, I have a vector of the form:
[10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a combination of sequences of non-missing values and missing values, with each sequence
possibly of a different length.

I'd like to create another vector which will help me pick out the sequences
of non-missing values.  For the example above, this would be:
[1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal ultimately is
to calculate means separately for each sequence.

Your help is appreciated.  If I'm making this more complicated than
necessary, I'd appreciate knowing that as well!

Many thanks.

Here is one possibility:

Vec <- c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9)

> Vec
 [1] 10  8  1  3  0  8 NA NA NA NA  2  1  6 NA NA NA  0  5  1  9


Use rle() to get the runs of NA and non-NA values. See ?rle

Runs <- rle(is.na(Vec))

> Runs
Run Length Encoding
  lengths: int [1:5] 6 4 3 3 4
  values : logi [1:5] FALSE TRUE FALSE TRUE FALSE


Create grouping values for each run:

Grps <- rep(seq(length(Runs$lengths)), Runs$lengths)

> Grps
 [1] 1 1 1 1 1 1 2 2 2 2 3 3 3 4 4 4 5 5 5 5


Now get the means for each run, split by Grps. See ?aggregate

> aggregate(Vec, list(Grps = Grps), mean)
  Grps    x
1    1 5.00
2    2   NA
3    3 3.00
4    4   NA
5    5 3.75


If you don't want the NA runs included in the result, you could use subset():

> subset(aggregate(Vec, list(Grps = Grps), mean), !is.na(x))
  Grps    x
1    1 5.00
3    3 3.00
5    5 3.75


HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to