On Thu, Mar 01, 2012 at 05:42:48PM +0100, Petr Savicky wrote: > On Thu, Mar 01, 2012 at 04:27:45AM -0800, syrvn wrote: > > Hello, > > > > I am stuck with selecting the right rows from a data frame. I think the > > problem is rather how to select them > > then how to implement the R code. > > > > Consider the following data frame: > > > > df <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), value = > > c(34,12,23,25,34,42,48,29,30,27)) > > > > What I want to achieve is to select 7 rows (values) so that the mean value > > of those rows are closest > > to the value of 35 and the remaining 3 rows (values) are closest to 45. > > However, each value is only > > allowed to be sampled once! > > Hi. > > If some 3 rows have mean close to 45, then they have sum close > to 3*45, so the remaining 7 rows have sum close to > > sum(df$value) - 3*45 # [1] 169 > > and they have mean close to 169/7 = 24.14286. In other words, > the two criteria cannot be optimized together. > > For this reason, let me choose the criterion on 3 rows. > The closest solution may be found as follows. > > # generate all triples and compute their means > tripleMeans <- colMeans(combn(df$value, 3)) > > # select the index of the triple with mean closest to 35 > indClosest <- which.min(abs(tripleMeans - 35))
I am sorry. There should be 45 and not 35. indClosest <- which.min(abs(tripleMeans - 45)) # generate the indices, which form the closest triple in df$value tripleInd <- combn(1:length(df$value), 3)[, indClosest] tripleInd # [1] 1 6 7 # check the mean of the triple mean(df$value[tripleInd]) # [1] 41.33333 Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.