Here are are assuming 1. for each row that if that row's value is within 200 - 300 of the prior or next value with the same ind then that row should be extracted. 2. the input is sorted by values within ind If that's not the intention then modify the code accordingly.
First we read in the data into data frame DF. Then we define between(x, min, max) which is a function that returns a vector whose ith component is TRUE if x[i] is between min and max. Then use ave() to get a selection vector. In this case ave returns a vector of zeros and ones and we convert that to the logical vector sel which defines the selection. # read the data Lines <- "values ind 1 2655 7A5 2 3028 7A5 3 689 ABBA-1 4 1336 ABBA-1 5 1560 ABBA-1 6 2820 ABLIM1 7 3339 ABLIM1 8 171 ACSM5 9 195 ACSM5 10 43 ADAMDEC1 11 129 ADAMDEC1 12 1105 AFF1 13 3202 AFF1 14 852 AFF3 15 2461 AFF3 16 45 AKT1 17 397 AKT1 18 1430 AQP2 19 2402 AQP2 20 2551 ARHGAP19" DF <- read.table(textConnection(Lines), header = TRUE) between <- function(x, min, max) x > min & max > x sel <- ave(DF$values, DF$ind, FUN = function(v) between(c(FALSE, diff(v)), 200, 300) | between(c(diff(v), FALSE), 200, 300) ) > 0 DF[sel, ] On Sat, May 30, 2009 at 10:13 AM, Iain Gallagher <iaingallag...@btopenworld.com> wrote: > > Hello list > > I have a problem with a dataset (see toy example below) where I am trying to > find the difference between two (or more numbers) and discard those > observations which fall outside a set interval. > > An example and further explanation: > > values ind > 1 2655 7A5 > 2 3028 7A5 > 3 689 ABBA-1 > 4 1336 ABBA-1 > 5 1560 ABBA-1 > 6 2820 ABLIM1 > 7 3339 ABLIM1 > 8 171 ACSM5 > 9 195 ACSM5 > 10 43 ADAMDEC1 > 11 129 ADAMDEC1 > 12 1105 AFF1 > 13 3202 AFF1 > 14 852 AFF3 > 15 2461 AFF3 > 16 45 AKT1 > 17 397 AKT1 > 18 1430 AQP2 > 19 2402 AQP2 > 20 2551 ARHGAP19 > > Each number in the values column above is associated with a label (in the ind > column). For some inds there will be only 2 values but as can be seen from > the data other inds have many values. > > Here's what I want to do using the ABBA-1 data from above as an example: > > calculate the differences between each value: > > 1560-1336 = 224 > 1336-689 = 647 > > then use these values to create an index that will allow me to pull out > values between set limits. If I set the limits to between 200 and 300 then > the index will reference rows 4 & 5 in the above data set. > > I hope this is reasonably clear and I appreciate any suggestions. > > Thanks > > Iain > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.