Hi, thank you for your suggestions. I think I'll stay with Dennis' approach as this is a real indexing approach:
df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length) > 4, ] I'll try that out now.... best regards /Johannes -------- Original-Nachricht -------- > Datum: Thu, 24 Nov 2011 09:12:57 -0500 > Von: Gabor Grothendieck <ggrothendi...@gmail.com> > An: Johannes Radinger <jradin...@gmx.at> > CC: r-help@r-project.org > Betreff: Re: [R] dataframe indexing by number of cases per group > On Thu, Nov 24, 2011 at 7:02 AM, Johannes Radinger <jradin...@gmx.at> > wrote: > > Hello, > > > > assume we have following dataframe: > > > > group <-c(rep("A",5),rep("B",6),rep("C",4)) > > x <- c(runif(5,1,5),runif(6,1,10),runif(4,2,15)) > > df <- data.frame(group,x) > > > > Now I want to select all cases (rows) for those groups > > which have more or equal 5 cases (so I want to select > > all cases of group A and B). > > How can I use the indexing for such questions? > > > > df[??]... I think it is probably quite easy but I really > > don't know how to do that at the moment. > > > > maybe someone can help me... > > > > Here are three approaches: > > subset(merge(df, xtabs(~ group, df)), Freq >= 5) > : > subset(transform(df, len = ave(x, group, FUN = length)), len >= 5) > > library(sqldf) > sqldf('select a.* > from df a join (select "group", count(*) "count" from df group by > "group") > using ("group") > where "count" >= 5') > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com -- ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.