Assuming your data frame is called DF we can use sqldf like this. The inner select calculates the maximum AreaPoly2 for each group such that Veg1 = Veg2 and the outer select returns the corresponding row.
library(sqldf) sqldf("select * from DF a where AreaPoly2 = (select max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)") Running it looks like this: > library(sqldf) > sqldf("select * from DF a where AreaPoly2 = + (select max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)") P1id Veg1 Veg2 AreaPoly2 P2ID 1 1 p p 1.5 2 2 2 p p 2.0 3 On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow <sbige...@fs.fed.us> wrote: > I have a data set similar to this: > > P1id Veg1 Veg2 AreaPoly2 P2ID > 1 p p 1 1 > 1 p p 1.5 2 > 2 p p 2 3 > 2 p h 3.5 4 > > For each group of "Poly1id" records, I wish to output (subset) the record > which has largest "AreaPoly2" value, but only if Veg1=Veg2. For this > example, the desired dataset would be > > P1id Veg1 Veg2 AreaPoly2 P2ID > 1 p p 1.5 2 > 2 p p 2 3 > > Can anyone point me in the right direction on this? > > Dr. Seth W. Bigelow > Biologist, USDA-FS Pacific Southwest Research Station > 1731 Research Park Drive, Davis California > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.