> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Gabor Grothendieck > Sent: Saturday, May 30, 2009 9:11 AM > To: Iain Gallagher > Cc: r-help@r-project.org > Subject: Re: [R] arithmetic problem > > Here are are assuming > > 1. for each row that if that row's value is within 200 - 300 of the > prior or next value with the same ind then that row should be > extracted. > 2. the input is sorted by values within ind > If that's not the intention then modify the code accordingly. > > First we read in the data into data frame DF. > > Then we define between(x, min, max) which is a function that returns a > vector whose > ith component is TRUE if x[i] is between min and max. > > Then use ave() to get a selection vector. In this case ave > returns a vector of > zeros and ones and we convert that to the logical vector sel which > defines the selection. > > # read the data > Lines <- "values ind > 1 2655 7A5 > 2 3028 7A5 > 3 689 ABBA-1 > 4 1336 ABBA-1 > 5 1560 ABBA-1 > 6 2820 ABLIM1 > 7 3339 ABLIM1 > 8 171 ACSM5 > 9 195 ACSM5 > 10 43 ADAMDEC1 > 11 129 ADAMDEC1 > 12 1105 AFF1 > 13 3202 AFF1 > 14 852 AFF3 > 15 2461 AFF3 > 16 45 AKT1 > 17 397 AKT1 > 18 1430 AQP2 > 19 2402 AQP2 > 20 2551 ARHGAP19" > DF <- read.table(textConnection(Lines), header = TRUE) > > between <- function(x, min, max) x > min & max > x > > sel <- ave(DF$values, DF$ind, FUN = function(v) > between(c(FALSE, diff(v)), 200, 300) | > between(c(diff(v), FALSE), 200, 300) > ) > 0 > > DF[sel, ]
Since DF is sorted appropriately we could speed that up by avoiding the repeated function calls done by ave() by or-ing in to your between() clauses the clause ind[-1]==ind[-length(ind)] as in sel1 <- with(DF, c( {dv<-values[-1]-values[-length(values)];dv>200&dv<300} & ind[-1]==ind[-length(ind)], FALSE)) (This one just gives the lower of each pair.) Someone recently proposed making a function like diff in which you could insert the operator of your choice, like "==" here, instead of the usual "-". That might make code like this easier to understand. > > > On Sat, May 30, 2009 at 10:13 AM, Iain Gallagher > <iaingallag...@btopenworld.com> wrote: > > > > Hello list > > > > I have a problem with a dataset (see toy example below) > where I am trying to find the difference between two (or more > numbers) and discard those observations which fall outside a > set interval. > > > > An example and further explanation: > > > > values ind > > 1 2655 7A5 > > 2 3028 7A5 > > 3 689 ABBA-1 > > 4 1336 ABBA-1 > > 5 1560 ABBA-1 > > 6 2820 ABLIM1 > > 7 3339 ABLIM1 > > 8 171 ACSM5 > > 9 195 ACSM5 > > 10 43 ADAMDEC1 > > 11 129 ADAMDEC1 > > 12 1105 AFF1 > > 13 3202 AFF1 > > 14 852 AFF3 > > 15 2461 AFF3 > > 16 45 AKT1 > > 17 397 AKT1 > > 18 1430 AQP2 > > 19 2402 AQP2 > > 20 2551 ARHGAP19 > > > > Each number in the values column above is associated with a > label (in the ind column). For some inds there will be only 2 > values but as can be seen from the data other inds have many values. > > > > Here's what I want to do using the ABBA-1 data from above > as an example: > > > > calculate the differences between each value: > > > > 1560-1336 = 224 > > 1336-689 = 647 > > > > then use these values to create an index that will allow me > to pull out values between set limits. If I set the limits to > between 200 and 300 then the index will reference rows 4 & 5 > in the above data set. > > > > I hope this is reasonably clear and I appreciate any suggestions. > > > > Thanks > > > > Iain > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.