Hi Silvano, Just add the selected elements to the return value: set.seed(123) Var.1 <- rep(LETTERS[1:4], 10) Var.2 <- sample(1:40, replace=FALSE) data <- data.frame(Var.1, Var.2) (Order <- data[order(data$Var.2, decreasing=TRUE), ]) allowed<-matrix(c(3,3,2,2,2,5,0,3,3,4,2,1),nrow=3,byrow=TRUE) colnames(allowed)<-LETTERS[1:4] select_largest<-function(x,allowed,n=10) { totals<-rep(0,nrow(allowed)) indices<-matrix(0,ncol=n,nrow=nrow(allowed)) for(i in 1:nrow(allowed)) { ii<-1 for(j in 1:ncol(allowed)) { if(allowed[i,j]) { indx<-which(x[,1] == colnames(allowed)[j]) totals[i]<-totals[i]+sum(x[indx[1:allowed[i,j]],2]) indices[i,ii:(ii+allowed[i,j]-1)]<-indx[1:allowed[i,j]] ii<-ii+allowed[i,j] } } } largest<-which.max(totals) # sort the indices here indices<-sort(indices[largest,]) return(list(scheme=largest,total=totals[largest], indices=indices,elements=x[indices,])) } select_largest(Order,allowed)
Jim On Thu, Aug 26, 2021 at 12:46 AM Silvano Cesar da Costa <silv...@uel.br> wrote: > > Wow, > > That's exactly what I want. But, if possible, that a list was created with > the selected elements (variable and value). > Is it possible to add in the output file? > Thank you very much. > > Prof. Dr. Silvano Cesar da Costa > Universidade Estadual de Londrina > Centro de Ciências Exatas > Departamento de Estatística > > Fone: (43) 3371-4346 > > > Em qua., 25 de ago. de 2021 às 03:12, Jim Lemon <drjimle...@gmail.com> > escreveu: >> >> Hi Silvano, >> I was completely stumped by your problem until I looked through Petr's >> response and guessed that you wanted the largest sum of 'Var.1" >> constrained by the specified numbers in your three schemes. I think >> this is what you want, but I haven't checked it exhaustively. >> >> set.seed(123) >> Var.1 <- rep(LETTERS[1:4], 10) >> Var.2 <- sample(1:40, replace=FALSE) >> data <- data.frame(Var.1, Var.2) >> (Order <- data[order(data$Var.2, decreasing=TRUE), ]) >> allowed<-matrix(c(3,3,2,2,2,5,0,3,3,4,2,1),nrow=3,byrow=TRUE) >> colnames(allowed)<-LETTERS[1:4] >> select_largest<-function(x,allowed,n=10) { >> totals<-rep(0,nrow(allowed)) >> indices<-matrix(0,ncol=n,nrow=nrow(allowed)) >> for(i in 1:nrow(allowed)) { >> ii<-1 >> for(j in 1:ncol(allowed)) { >> if(allowed[i,j]) { >> indx<-which(x[,1] == colnames(allowed)[j]) >> totals[i]<-totals[i]+sum(x[indx[1:allowed[i,j]],2]) >> indices[i,ii:(ii+allowed[i,j]-1)]<-indx[1:allowed[i,j]] >> ii<-ii+allowed[i,j] >> } >> } >> } >> largest<-which.max(totals) >> return(list(scheme=largest,total=totals[largest], >> indices=sort(indices[largest,]))) >> } >> select_largest(Order,allowed) >> >> Jim >> >> On Tue, Aug 24, 2021 at 7:11 PM PIKAL Petr <petr.pi...@precheza.cz> wrote: >> > >> > Hi. >> > >> > Now it is understandable. However the solution is not clear for me. >> > >> > table(Order$Var.1[1:10]) >> > A B C D >> > 4 1 2 3 >> > >> > should give you a hint which scheme could be acceptable, but how to do it >> > programmatically I do not know. >> > >> > maybe to start with lower value in the table call and gradually increse it >> > to check which scheme starts to be the chosen one >> > >> > > table(data.o$Var.1[1]) # scheme 2 is out >> > C >> > 1 >> > ... >> > > table(data.o$Var.1[1:5]) #scheme 3 >> > A B C D >> > 1 1 2 1 >> > >> > > table(data.o$Var.1[1:6]) #scheme 3 >> > >> > A B C D >> > 2 1 2 1 >> > >> > > table(data.o$Var.1[1:7]) # scheme1 >> > A B C D >> > 2 1 2 2 >> > >> > > table(data.o$Var.1[1:8]) # no such scheme, so scheme 1 is chosen one >> > A B C D >> > 2 1 2 3 >> > >> > #Now you need to select values based on scheme 1. >> > # 3A - 3B - 2C - 2D >> > >> > sss <- split(Order, Order$Var.1) >> > selection <- c(3,3,2,2) >> > result <- vector("list", 4) >> > >> > #I would use loop >> > >> > for(i in 1:4) { >> > result[[i]] <- sss[[i]][1:selection[i],] >> > } >> > >> > Maybe someone come with other ingenious solution. >> > >> > Cheers >> > Petr >> > >> > From: Silvano Cesar da Costa <silv...@uel.br> >> > Sent: Monday, August 23, 2021 7:54 PM >> > To: PIKAL Petr <petr.pi...@precheza.cz> >> > Cc: r-help@r-project.org >> > Subject: Re: [R] Selecting elements >> > >> > Hi, >> > >> > I apologize for the confusion. I will try to be clearer in my explanation. >> > I believe that with the R script it becomes clearer. >> > >> > I have 4 variables with 10 repetitions and each one receives a value, >> > randomly. >> > I order the dataset from largest to smallest value. I have to select 10 >> > elements in >> > descending order of values, according to one of three schemes: >> > >> > # 3A - 3B - 2C - 2D >> > # 2A - 5B - 0C - 3D >> > # 3A - 4B - 2C - 1D >> > >> > If the first 3 elements (out of the 10 to be selected) are of the letter >> > D, automatically >> > the adopted scheme will be the second. So, I have to (following) choose >> > 2A, 5B and 0C. >> > How to make the selection automatically? >> > >> > I created two selection examples, with different schemes: >> > >> > >> > >> > set.seed(123) >> > >> > Var.1 = rep(LETTERS[1:4], 10) >> > Var.2 = sample(1:40, replace=FALSE) >> > >> > data = data.frame(Var.1, Var.2) >> > >> > (Order = data[order(data$Var.2, decreasing=TRUE), ]) >> > >> > # I must select the 10 highest values (), >> > # but which follow a certain scheme: >> > # >> > # 3A - 3B - 2C - 2D or >> > # 2A - 5B - 0C - 3D or >> > # 3A - 4B - 2C - 1D >> > # >> > # In this case, I started with the highest value that refers to the letter >> > C. >> > # Next comes only 1 of the letters B, A and D. All are selected once. >> > # The fifth observation is the letter C, completing 2 C values. In this >> > case, >> > # following the 3 adopted schemes, note that the second scheme has 0C, >> > # so this scheme is out. >> > # Therefore, it can be the first scheme (3A - 3B - 2C - 2D) or the >> > # third scheme (3A - 4B - 2C - 1D). >> > # The next letter to be completed is the D (fourth and seventh elements), >> > # among the 10 elements being selected. Therefore, the scheme adopted is >> > the >> > # first one (3A - 3B - 2C - 2D). >> > # Therefore, it is necessary to select 2 values with the letter B and 1 >> > value >> > # with the letter A. >> > # >> > # Manual Selection - >> > # The end result is: >> > (Selected.data = Order[c(1,2,3,4,5,6,7,9,13,16), ]) >> > >> > # Scheme: 3A - 3B - 2C - 2D >> > sort(Selected.data$Var.1) >> > >> > >> > #------------------ >> > # Second example: - >> > #------------------ >> > set.seed(4) >> > >> > Var.1 = rep(LETTERS[1:4], 10) >> > Var.2 = sample(1:40, replace=FALSE) >> > >> > data = data.frame(Var.1, Var.2) >> > (Order = data[order(data$Var.2, decreasing=TRUE), ]) >> > >> > # The end result is: >> > (Selected.data.2 = Order[c(1,2,3,4,5,6,7,8,9,11), ]) >> > >> > # Scheme: 3A - 4B - 2C - 1D >> > sort(Selected.data.2$Var.1) >> > >> > How to make the selection of the 10 elements automatically? >> > >> > Thank you very much. >> > >> > Prof. Dr. Silvano Cesar da Costa >> > Universidade Estadual de Londrina >> > Centro de Ciências Exatas >> > Departamento de Estatística >> > >> > Fone: (43) 3371-4346 >> > >> > >> > Em seg., 23 de ago. de 2021 às 05:05, PIKAL Petr >> > <mailto:petr.pi...@precheza.cz> escreveu: >> > Hi >> > >> > Only I got your HTML formated mail, rest of the world got complete mess. >> > Do not use HTML formating. >> > >> > As I got it right I wonder why in your second example you did not follow >> > 3A - 3B - 2C - 2D >> > >> > as D were positioned 1st and 4th. >> > >> > I hope that you could use something like >> > >> > sss <- split(data$Var.2, data$Var.1) >> > lapply(sss, cumsum) >> > $A >> > [1] 38 73 105 136 166 188 199 207 209 210 >> > >> > $B >> > [1] 39 67 92 115 131 146 153 159 164 168 >> > >> > $C >> > [1] 40 76 105 131 152 171 189 203 213 222 >> > >> > $D >> > [1] 37 71 104 131 155 175 192 205 217 220 >> > >> > Now you need to evaluate this result according to your sets. Here the >> > highest value (76) is in C so the set with 2C is the one you should choose >> > and select you value according to this set. >> > >> > With >> > > set.seed(666) >> > > Var.1 = rep(LETTERS[1:4], 10) >> > > Var.2 = sample(1:40, replace=FALSE) >> > > data = data.frame(Var.1, Var.2) >> > > data <- data[order(data$Var.2, decreasing=TRUE), ] >> > > sss <- split(data$Var.2, data$Var.1) >> > > lapply(sss, cumsum) >> > $A >> > [1] 36 70 102 133 163 182 200 207 212 213 >> > >> > $B >> > [1] 35 57 78 95 108 120 131 140 148 150 >> > >> > $C >> > [1] 40 73 102 130 156 180 196 211 221 225 >> > >> > $D >> > [1] 39 77 114 141 166 189 209 223 229 232 >> > >> > Highest value is in D so either 3A - 3B - 2C - 2D or 3A - 3B - 2C - 2D >> > should be appropriate. And here I am again lost as both sets are same. >> > Maybe you need to reconsider your statements. >> > >> > Cheers >> > Petr >> > >> > From: Silvano Cesar da Costa <mailto:silv...@uel.br> >> > Sent: Friday, August 20, 2021 9:28 PM >> > To: PIKAL Petr <mailto:petr.pi...@precheza.cz> >> > Cc: mailto:r-help@r-project.org >> > Subject: Re: [R] Selecting elements >> > >> > Hi, thanks you for the answer. >> > Sorry English is not my native language. >> > >> > But you got it right. >> > > As C is first and fourth biggest value, you follow third option and >> > > select 3 highest A, 3B 2C and 2D? >> > >> > I must select the 10 (not 15) highest values, but which follow a certain >> > order: >> > 3A - 3B - 2C - 2D or >> > 2A - 5B - 0C - 3D or >> > 3A - 3B - 2C - 2D >> > I'll put the example in Excel for a better understanding (with 20 elements >> > only). >> > I must select 10 elements (the highest values of variable Var.2), which >> > fit one of the 3 options above. >> > >> > Number >> > Position >> > Var.1 >> > Var.2 >> > >> > >> > >> > >> > >> > >> > >> > >> > 1 >> > 27 >> > C >> > 40 >> > >> > >> > >> > >> > >> > >> > >> > >> > 2 >> > 30 >> > B >> > 39 >> > >> > Selected: >> > >> > >> > >> > >> > >> > 3 >> > 5 >> > A >> > 38 >> > >> > Number >> > Position >> > Var.1 >> > Var.2 >> > >> > >> > >> > 4 >> > 16 >> > D >> > 37 >> > >> > 1 >> > 27 >> > C >> > 40 >> > >> > >> > >> > 5 >> > 23 >> > C >> > 36 >> > >> > 2 >> > 30 >> > B >> > 39 >> > >> > 3A - 3B - 2C - 2D >> > 6 >> > 13 >> > A >> > 35 >> > >> > 3 >> > 5 >> > A >> > 38 >> > >> > >> > >> > 7 >> > 20 >> > D >> > 34 >> > >> > 4 >> > 16 >> > D >> > 37 >> > >> > 3A - 3B - 1C - 3D >> > 8 >> > 12 >> > D >> > 33 >> > >> > 5 >> > 23 >> > C >> > 36 >> > >> > >> > >> > 9 >> > 9 >> > A >> > 32 >> > >> > 6 >> > 13 >> > A >> > 35 >> > >> > 2A - 5B - 0C - 3D >> > 10 >> > 1 >> > A >> > 31 >> > >> > 7 >> > 20 >> > D >> > 34 >> > >> > >> > >> > 11 >> > 21 >> > A >> > 30 >> > >> > 10 >> > 9 >> > A >> > 32 >> > >> > >> > >> > 12 >> > 35 >> > C >> > 29 >> > >> > 13 >> > 14 >> > B >> > 28 >> > >> > >> > >> > 13 >> > 14 >> > B >> > 28 >> > >> > 17 >> > 6 >> > B >> > 25 >> > >> > >> > >> > 14 >> > 8 >> > D >> > 27 >> > >> > >> > >> > >> > >> > >> > >> > >> > 15 >> > 7 >> > C >> > 26 >> > >> > >> > >> > >> > >> > >> > >> > >> > 16 >> > 6 >> > B >> > 25 >> > >> > >> > >> > >> > >> > >> > >> > >> > 17 >> > 40 >> > D >> > 24 >> > >> > >> > >> > >> > >> > >> > >> > >> > 18 >> > 26 >> > B >> > 23 >> > >> > >> > >> > >> > >> > >> > >> > >> > 19 >> > 29 >> > A >> > 22 >> > >> > >> > >> > >> > >> > >> > >> > >> > 20 >> > 31 >> > C >> > 21 >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > Second option (other data set): >> > >> > Number >> > Position >> > Var.1 >> > Var.2 >> > >> > >> > >> > >> > >> > >> > >> > >> > 1 >> > 36 >> > D >> > 20 >> > >> > >> > >> > >> > >> > >> > >> > >> > 2 >> > 11 >> > B >> > 19 >> > >> > Selected: >> > >> > >> > >> > >> > >> > 3 >> > 39 >> > A >> > 18 >> > >> > Number >> > Position >> > Var.1 >> > Var.2 >> > >> > >> > >> > 4 >> > 24 >> > D >> > 17 >> > >> > 1 >> > 36 >> > D >> > 20 >> > >> > >> > >> > 5 >> > 34 >> > B >> > 16 >> > >> > 2 >> > 11 >> > B >> > 19 >> > >> > 3A - 3B - 2C - 2D >> > 6 >> > 2 >> > B >> > 15 >> > >> > 3 >> > 39 >> > A >> > 18 >> > >> > >> > >> > 7 >> > 3 >> > A >> > 14 >> > >> > 4 >> > 24 >> > D >> > 17 >> > >> > 3A - 3B - 1C - 3D >> > 8 >> > 32 >> > D >> > 13 >> > >> > 5 >> > 34 >> > B >> > 16 >> > >> > >> > >> > 9 >> > 28 >> > D >> > 12 >> > >> > 6 >> > 2 >> > B >> > 15 >> > >> > 2A - 5B - 0C - 3D >> > 10 >> > 25 >> > A >> > 11 >> > >> > 7 >> > 3 >> > A >> > 14 >> > >> > >> > >> > 11 >> > 19 >> > B >> > 10 >> > >> > 8 >> > 32 >> > D >> > 13 >> > >> > >> > >> > 12 >> > 15 >> > B >> > 9 >> > >> > 9 >> > 25 >> > A >> > 11 >> > >> > >> > >> > 13 >> > 17 >> > A >> > 8 >> > >> > 10 >> > 18 >> > C >> > 7 >> > >> > >> > >> > 14 >> > 18 >> > C >> > 7 >> > >> > >> > >> > >> > >> > >> > >> > >> > 15 >> > 38 >> > B >> > 6 >> > >> > >> > >> > >> > >> > >> > >> > >> > 16 >> > 10 >> > B >> > 5 >> > >> > >> > >> > >> > >> > >> > >> > >> > 17 >> > 22 >> > B >> > 4 >> > >> > >> > >> > >> > >> > >> > >> > >> > 18 >> > 4 >> > D >> > 3 >> > >> > >> > >> > >> > >> > >> > >> > >> > 19 >> > 33 >> > A >> > 2 >> > >> > >> > >> > >> > >> > >> > >> > >> > 20 >> > 37 >> > A >> > 1 >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > How to make the selection of these 10 elements that fit one of the 3 >> > options using R? >> > >> > Thanks, >> > >> > Prof. Dr. Silvano Cesar da Costa >> > Universidade Estadual de Londrina >> > Centro de Ciências Exatas >> > Departamento de Estatística >> > >> > Fone: (43) 3371-4346 >> > >> > >> > Em sex., 20 de ago. de 2021 às 03:28, PIKAL Petr >> > <mailto:mailto:petr.pi...@precheza.cz> escreveu: >> > Hallo >> > >> > I am confused, maybe others know what do you want but could you be more >> > specific? >> > >> > Let say you have such data >> > set.seed(123) >> > Var.1 = rep(LETTERS[1:4], 10) >> > Var.2 = sample(1:40, replace=FALSE) >> > data = data.frame(Var.1, Var.2) >> > >> > What should be the desired outcome? >> > >> > You can sort >> > data <- data[order(data$Var.2, decreasing=TRUE), ] >> > and split the data >> > > split(data$Var.2, data$Var.1) >> > $A >> > [1] 38 35 32 31 30 22 11 8 2 1 >> > >> > $B >> > [1] 39 28 25 23 16 15 7 6 5 4 >> > >> > $C >> > [1] 40 36 29 26 21 19 18 14 10 9 >> > >> > $D >> > [1] 37 34 33 27 24 20 17 13 12 3 >> > >> > T inspect highest values. But here I am lost. As C is first and fourth >> > biggest value, you follow third option and select 3 highest A, 3B 2C and >> > 2D? >> > >> > Or I do not understand at all what you really want to achieve. >> > >> > Cheers >> > Petr >> > >> > > -----Original Message----- >> > > From: R-help <mailto:mailto:r-help-boun...@r-project.org> On Behalf Of >> > > Silvano Cesar da >> > > Costa >> > > Sent: Thursday, August 19, 2021 10:40 PM >> > > To: mailto:mailto:r-help@r-project.org >> > > Subject: [R] Selecting elements >> > > >> > > Hi, >> > > >> > > I need to select 15 elements, always considering the highest values >> > > (descending order) but obeying the following configuration: >> > > >> > > 3A - 4B - 0C - 3D or >> > > 2A - 5B - 0C - 3D or >> > > 3A - 3B - 2C - 2D >> > > >> > > If I have, for example, 5 A elements as the highest values, I can only >> > > choose >> > > (first and third choice) or 2 (second choice) elements. >> > > >> > > how to make this selection? >> > > >> > > >> > > library(dplyr) >> > > >> > > Var.1 = rep(LETTERS[1:4], 10) >> > > Var.2 = sample(1:40, replace=FALSE) >> > > >> > > data = data.frame(Var.1, Var.2) >> > > (data = data[order(data$Var.2, decreasing=TRUE), ]) >> > > >> > > Elements = data %>% >> > > arrange(desc(Var.2)) >> > > >> > > Thanks, >> > > >> > > Prof. Dr. Silvano Cesar da Costa >> > > Universidade Estadual de Londrina >> > > Centro de Ciências Exatas >> > > Departamento de Estatística >> > > >> > > Fone: (43) 3371-4346 >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > ______________________________________________ >> > > mailto:mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and >> > > more, see >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide http://www.R-project.org/posting- >> > > guide.html >> > > and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.