I have to take the first 25 IDs ranked by count(*). But the following is not allowed in Hive
select id from T order by count(*) desc limit 25; Which yields a "NOt yet supported place for UDAF count". The way around it is the following select id, count(*) as cnt from T group by id order by cnt desc limit 25; However I need to put this query in a subquery like so select id, XXX from T t join .... where t.id in (select id, count(*) as cnt from T group by id order by cnt desc limit 25) group by ... ; which does NOT work because a subquery is allowed to return only one thing. Here XXX are complex constructs like distance calculation and binning. These are time consuming and complex operations. The only way I see this would work is if I use the following sub-sub query select id, XXX from T t join ... where t.id in (select sub.id from (select id, count(*) as cnt from T group by id order by cnt desc limit 25)sub) group by ... ; The reason I don't want to put the limit in the outermost query is because those XXX queries are expensive and I don't want them to be performed on the entire result only to retain the top 25 and throw away the rest of the results. The count(*) operation of the rest of the IDs is not very expensive or time consuming. Is there some other elegant way of handling this without using a sub-sub-query approach?