On Jul 7, 2009, at 4:08 PM, Krishna Tateneni wrote:
Greetings, I have a vector of the form:
[10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a
combination
of sequences of non-missing values and missing values, with each
sequence
possibly of a different length.
I'd like to create another vector which will help me pick out the
sequences
of non-missing values. For the example above, this would be:
[1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal
ultimately is
to calculate means separately for each sequence.
Your help is appreciated. If I'm making this more complicated than
necessary, I'd appreciate knowing that as well!
Many thanks.
Here is one possibility:
Vec <- c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9)
> Vec
[1] 10 8 1 3 0 8 NA NA NA NA 2 1 6 NA NA NA 0 5 1 9
Use rle() to get the runs of NA and non-NA values. See ?rle
Runs <- rle(is.na(Vec))
> Runs
Run Length Encoding
lengths: int [1:5] 6 4 3 3 4
values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
Create grouping values for each run:
Grps <- rep(seq(length(Runs$lengths)), Runs$lengths)
> Grps
[1] 1 1 1 1 1 1 2 2 2 2 3 3 3 4 4 4 5 5 5 5
Now get the means for each run, split by Grps. See ?aggregate
> aggregate(Vec, list(Grps = Grps), mean)
Grps x
1 1 5.00
2 2 NA
3 3 3.00
4 4 NA
5 5 3.75
If you don't want the NA runs included in the result, you could use
subset():
> subset(aggregate(Vec, list(Grps = Grps), mean), !is.na(x))
Grps x
1 1 5.00
3 3 3.00
5 5 3.75
HTH,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.