Hello, again.
Petr Savicky wrote > > On Thu, Mar 01, 2012 at 05:42:48PM +0100, Petr Savicky wrote: >> On Thu, Mar 01, 2012 at 04:27:45AM -0800, syrvn wrote: >> > Hello, >> > >> > I am stuck with selecting the right rows from a data frame. I think the >> > problem is rather how to select them >> > then how to implement the R code. >> > >> > Consider the following data frame: >> > >> > df <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), value = >> > c(34,12,23,25,34,42,48,29,30,27)) >> > >> > What I want to achieve is to select 7 rows (values) so that the mean >> value >> > of those rows are closest >> > to the value of 35 and the remaining 3 rows (values) are closest to 45. >> > However, each value is only >> > allowed to be sampled once! >> >> Hi. >> >> If some 3 rows have mean close to 45, then they have sum close >> to 3*45, so the remaining 7 rows have sum close to >> >> sum(df$value) - 3*45 # [1] 169 >> >> and they have mean close to 169/7 = 24.14286. In other words, >> the two criteria cannot be optimized together. >> >> For this reason, let me choose the criterion on 3 rows. >> The closest solution may be found as follows. >> >> # generate all triples and compute their means >> tripleMeans <- colMeans(combn(df$value, 3)) >> >> # select the index of the triple with mean closest to 35 >> indClosest <- which.min(abs(tripleMeans - 35)) > > I am sorry. There should be 45 and not 35. > > indClosest <- which.min(abs(tripleMeans - 45)) > > # generate the indices, which form the closest triple in df$value > tripleInd <- combn(1:length(df$value), 3)[, indClosest] > tripleInd # [1] 1 6 7 > > # check the mean of the triple > mean(df$value[tripleInd]) # [1] 41.33333 > > Petr Savicky. > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > There are two solutions for the 3 rows criterion, 'which.min' only finds one, the first in the order given by 'combn'. (And I've corrected my first post but still with an error) # Forgot to change the index matrix meansDist2 <- apply(inxmat2, 2, function(jnx) f(jnx, DF$value, 45)) # Two solutions (i2 <- which(meansDist2 == min(meansDist2))) inxmat2[, i2] mean(DF$value[inxmat2[, i2][, 1]]) [1] 41.33333 Petr's solution and mine give the same mean value. But use for small values of (n, k) only. Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/select-rows-by-criteria-tp4434812p4435760.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.