Hi Silvano, I was completely stumped by your problem until I looked through Petr's response and guessed that you wanted the largest sum of 'Var.1" constrained by the specified numbers in your three schemes. I think this is what you want, but I haven't checked it exhaustively.
set.seed(123) Var.1 <- rep(LETTERS[1:4], 10) Var.2 <- sample(1:40, replace=FALSE) data <- data.frame(Var.1, Var.2) (Order <- data[order(data$Var.2, decreasing=TRUE), ]) allowed<-matrix(c(3,3,2,2,2,5,0,3,3,4,2,1),nrow=3,byrow=TRUE) colnames(allowed)<-LETTERS[1:4] select_largest<-function(x,allowed,n=10) { totals<-rep(0,nrow(allowed)) indices<-matrix(0,ncol=n,nrow=nrow(allowed)) for(i in 1:nrow(allowed)) { ii<-1 for(j in 1:ncol(allowed)) { if(allowed[i,j]) { indx<-which(x[,1] == colnames(allowed)[j]) totals[i]<-totals[i]+sum(x[indx[1:allowed[i,j]],2]) indices[i,ii:(ii+allowed[i,j]-1)]<-indx[1:allowed[i,j]] ii<-ii+allowed[i,j] } } } largest<-which.max(totals) return(list(scheme=largest,total=totals[largest], indices=sort(indices[largest,]))) } select_largest(Order,allowed) Jim On Tue, Aug 24, 2021 at 7:11 PM PIKAL Petr <petr.pi...@precheza.cz> wrote: > > Hi. > > Now it is understandable. However the solution is not clear for me. > > table(Order$Var.1[1:10]) > A B C D > 4 1 2 3 > > should give you a hint which scheme could be acceptable, but how to do it > programmatically I do not know. > > maybe to start with lower value in the table call and gradually increse it to > check which scheme starts to be the chosen one > > > table(data.o$Var.1[1]) # scheme 2 is out > C > 1 > ... > > table(data.o$Var.1[1:5]) #scheme 3 > A B C D > 1 1 2 1 > > > table(data.o$Var.1[1:6]) #scheme 3 > > A B C D > 2 1 2 1 > > > table(data.o$Var.1[1:7]) # scheme1 > A B C D > 2 1 2 2 > > > table(data.o$Var.1[1:8]) # no such scheme, so scheme 1 is chosen one > A B C D > 2 1 2 3 > > #Now you need to select values based on scheme 1. > # 3A - 3B - 2C - 2D > > sss <- split(Order, Order$Var.1) > selection <- c(3,3,2,2) > result <- vector("list", 4) > > #I would use loop > > for(i in 1:4) { > result[[i]] <- sss[[i]][1:selection[i],] > } > > Maybe someone come with other ingenious solution. > > Cheers > Petr > > From: Silvano Cesar da Costa <silv...@uel.br> > Sent: Monday, August 23, 2021 7:54 PM > To: PIKAL Petr <petr.pi...@precheza.cz> > Cc: r-help@r-project.org > Subject: Re: [R] Selecting elements > > Hi, > > I apologize for the confusion. I will try to be clearer in my explanation. I > believe that with the R script it becomes clearer. > > I have 4 variables with 10 repetitions and each one receives a value, > randomly. > I order the dataset from largest to smallest value. I have to select 10 > elements in > descending order of values, according to one of three schemes: > > # 3A - 3B - 2C - 2D > # 2A - 5B - 0C - 3D > # 3A - 4B - 2C - 1D > > If the first 3 elements (out of the 10 to be selected) are of the letter D, > automatically > the adopted scheme will be the second. So, I have to (following) choose 2A, > 5B and 0C. > How to make the selection automatically? > > I created two selection examples, with different schemes: > > > > set.seed(123) > > Var.1 = rep(LETTERS[1:4], 10) > Var.2 = sample(1:40, replace=FALSE) > > data = data.frame(Var.1, Var.2) > > (Order = data[order(data$Var.2, decreasing=TRUE), ]) > > # I must select the 10 highest values (), > # but which follow a certain scheme: > # > # 3A - 3B - 2C - 2D or > # 2A - 5B - 0C - 3D or > # 3A - 4B - 2C - 1D > # > # In this case, I started with the highest value that refers to the letter C. > # Next comes only 1 of the letters B, A and D. All are selected once. > # The fifth observation is the letter C, completing 2 C values. In this case, > # following the 3 adopted schemes, note that the second scheme has 0C, > # so this scheme is out. > # Therefore, it can be the first scheme (3A - 3B - 2C - 2D) or the > # third scheme (3A - 4B - 2C - 1D). > # The next letter to be completed is the D (fourth and seventh elements), > # among the 10 elements being selected. Therefore, the scheme adopted is the > # first one (3A - 3B - 2C - 2D). > # Therefore, it is necessary to select 2 values with the letter B and 1 value > # with the letter A. > # > # Manual Selection - > # The end result is: > (Selected.data = Order[c(1,2,3,4,5,6,7,9,13,16), ]) > > # Scheme: 3A - 3B - 2C - 2D > sort(Selected.data$Var.1) > > > #------------------ > # Second example: - > #------------------ > set.seed(4) > > Var.1 = rep(LETTERS[1:4], 10) > Var.2 = sample(1:40, replace=FALSE) > > data = data.frame(Var.1, Var.2) > (Order = data[order(data$Var.2, decreasing=TRUE), ]) > > # The end result is: > (Selected.data.2 = Order[c(1,2,3,4,5,6,7,8,9,11), ]) > > # Scheme: 3A - 4B - 2C - 1D > sort(Selected.data.2$Var.1) > > How to make the selection of the 10 elements automatically? > > Thank you very much. > > Prof. Dr. Silvano Cesar da Costa > Universidade Estadual de Londrina > Centro de Ciências Exatas > Departamento de Estatística > > Fone: (43) 3371-4346 > > > Em seg., 23 de ago. de 2021 às 05:05, PIKAL Petr > <mailto:petr.pi...@precheza.cz> escreveu: > Hi > > Only I got your HTML formated mail, rest of the world got complete mess. Do > not use HTML formating. > > As I got it right I wonder why in your second example you did not follow > 3A - 3B - 2C - 2D > > as D were positioned 1st and 4th. > > I hope that you could use something like > > sss <- split(data$Var.2, data$Var.1) > lapply(sss, cumsum) > $A > [1] 38 73 105 136 166 188 199 207 209 210 > > $B > [1] 39 67 92 115 131 146 153 159 164 168 > > $C > [1] 40 76 105 131 152 171 189 203 213 222 > > $D > [1] 37 71 104 131 155 175 192 205 217 220 > > Now you need to evaluate this result according to your sets. Here the highest > value (76) is in C so the set with 2C is the one you should choose and select > you value according to this set. > > With > > set.seed(666) > > Var.1 = rep(LETTERS[1:4], 10) > > Var.2 = sample(1:40, replace=FALSE) > > data = data.frame(Var.1, Var.2) > > data <- data[order(data$Var.2, decreasing=TRUE), ] > > sss <- split(data$Var.2, data$Var.1) > > lapply(sss, cumsum) > $A > [1] 36 70 102 133 163 182 200 207 212 213 > > $B > [1] 35 57 78 95 108 120 131 140 148 150 > > $C > [1] 40 73 102 130 156 180 196 211 221 225 > > $D > [1] 39 77 114 141 166 189 209 223 229 232 > > Highest value is in D so either 3A - 3B - 2C - 2D or 3A - 3B - 2C - 2D > should be appropriate. And here I am again lost as both sets are same. Maybe > you need to reconsider your statements. > > Cheers > Petr > > From: Silvano Cesar da Costa <mailto:silv...@uel.br> > Sent: Friday, August 20, 2021 9:28 PM > To: PIKAL Petr <mailto:petr.pi...@precheza.cz> > Cc: mailto:r-help@r-project.org > Subject: Re: [R] Selecting elements > > Hi, thanks you for the answer. > Sorry English is not my native language. > > But you got it right. > > As C is first and fourth biggest value, you follow third option and select > > 3 highest A, 3B 2C and 2D? > > I must select the 10 (not 15) highest values, but which follow a certain > order: > 3A - 3B - 2C - 2D or > 2A - 5B - 0C - 3D or > 3A - 3B - 2C - 2D > I'll put the example in Excel for a better understanding (with 20 elements > only). > I must select 10 elements (the highest values of variable Var.2), which fit > one of the 3 options above. > > Number > Position > Var.1 > Var.2 > > > > > > > > > 1 > 27 > C > 40 > > > > > > > > > 2 > 30 > B > 39 > > Selected: > > > > > > 3 > 5 > A > 38 > > Number > Position > Var.1 > Var.2 > > > > 4 > 16 > D > 37 > > 1 > 27 > C > 40 > > > > 5 > 23 > C > 36 > > 2 > 30 > B > 39 > > 3A - 3B - 2C - 2D > 6 > 13 > A > 35 > > 3 > 5 > A > 38 > > > > 7 > 20 > D > 34 > > 4 > 16 > D > 37 > > 3A - 3B - 1C - 3D > 8 > 12 > D > 33 > > 5 > 23 > C > 36 > > > > 9 > 9 > A > 32 > > 6 > 13 > A > 35 > > 2A - 5B - 0C - 3D > 10 > 1 > A > 31 > > 7 > 20 > D > 34 > > > > 11 > 21 > A > 30 > > 10 > 9 > A > 32 > > > > 12 > 35 > C > 29 > > 13 > 14 > B > 28 > > > > 13 > 14 > B > 28 > > 17 > 6 > B > 25 > > > > 14 > 8 > D > 27 > > > > > > > > > 15 > 7 > C > 26 > > > > > > > > > 16 > 6 > B > 25 > > > > > > > > > 17 > 40 > D > 24 > > > > > > > > > 18 > 26 > B > 23 > > > > > > > > > 19 > 29 > A > 22 > > > > > > > > > 20 > 31 > C > 21 > > > > > > > > > > > > Second option (other data set): > > Number > Position > Var.1 > Var.2 > > > > > > > > > 1 > 36 > D > 20 > > > > > > > > > 2 > 11 > B > 19 > > Selected: > > > > > > 3 > 39 > A > 18 > > Number > Position > Var.1 > Var.2 > > > > 4 > 24 > D > 17 > > 1 > 36 > D > 20 > > > > 5 > 34 > B > 16 > > 2 > 11 > B > 19 > > 3A - 3B - 2C - 2D > 6 > 2 > B > 15 > > 3 > 39 > A > 18 > > > > 7 > 3 > A > 14 > > 4 > 24 > D > 17 > > 3A - 3B - 1C - 3D > 8 > 32 > D > 13 > > 5 > 34 > B > 16 > > > > 9 > 28 > D > 12 > > 6 > 2 > B > 15 > > 2A - 5B - 0C - 3D > 10 > 25 > A > 11 > > 7 > 3 > A > 14 > > > > 11 > 19 > B > 10 > > 8 > 32 > D > 13 > > > > 12 > 15 > B > 9 > > 9 > 25 > A > 11 > > > > 13 > 17 > A > 8 > > 10 > 18 > C > 7 > > > > 14 > 18 > C > 7 > > > > > > > > > 15 > 38 > B > 6 > > > > > > > > > 16 > 10 > B > 5 > > > > > > > > > 17 > 22 > B > 4 > > > > > > > > > 18 > 4 > D > 3 > > > > > > > > > 19 > 33 > A > 2 > > > > > > > > > 20 > 37 > A > 1 > > > > > > > > > > > How to make the selection of these 10 elements that fit one of the 3 options > using R? > > Thanks, > > Prof. Dr. Silvano Cesar da Costa > Universidade Estadual de Londrina > Centro de Ciências Exatas > Departamento de Estatística > > Fone: (43) 3371-4346 > > > Em sex., 20 de ago. de 2021 às 03:28, PIKAL Petr > <mailto:mailto:petr.pi...@precheza.cz> escreveu: > Hallo > > I am confused, maybe others know what do you want but could you be more > specific? > > Let say you have such data > set.seed(123) > Var.1 = rep(LETTERS[1:4], 10) > Var.2 = sample(1:40, replace=FALSE) > data = data.frame(Var.1, Var.2) > > What should be the desired outcome? > > You can sort > data <- data[order(data$Var.2, decreasing=TRUE), ] > and split the data > > split(data$Var.2, data$Var.1) > $A > [1] 38 35 32 31 30 22 11 8 2 1 > > $B > [1] 39 28 25 23 16 15 7 6 5 4 > > $C > [1] 40 36 29 26 21 19 18 14 10 9 > > $D > [1] 37 34 33 27 24 20 17 13 12 3 > > T inspect highest values. But here I am lost. As C is first and fourth > biggest value, you follow third option and select 3 highest A, 3B 2C and 2D? > > Or I do not understand at all what you really want to achieve. > > Cheers > Petr > > > -----Original Message----- > > From: R-help <mailto:mailto:r-help-boun...@r-project.org> On Behalf Of > > Silvano Cesar da > > Costa > > Sent: Thursday, August 19, 2021 10:40 PM > > To: mailto:mailto:r-help@r-project.org > > Subject: [R] Selecting elements > > > > Hi, > > > > I need to select 15 elements, always considering the highest values > > (descending order) but obeying the following configuration: > > > > 3A - 4B - 0C - 3D or > > 2A - 5B - 0C - 3D or > > 3A - 3B - 2C - 2D > > > > If I have, for example, 5 A elements as the highest values, I can only > > choose > > (first and third choice) or 2 (second choice) elements. > > > > how to make this selection? > > > > > > library(dplyr) > > > > Var.1 = rep(LETTERS[1:4], 10) > > Var.2 = sample(1:40, replace=FALSE) > > > > data = data.frame(Var.1, Var.2) > > (data = data[order(data$Var.2, decreasing=TRUE), ]) > > > > Elements = data %>% > > arrange(desc(Var.2)) > > > > Thanks, > > > > Prof. Dr. Silvano Cesar da Costa > > Universidade Estadual de Londrina > > Centro de Ciências Exatas > > Departamento de Estatística > > > > Fone: (43) 3371-4346 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > mailto:mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > > see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.