On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
Is there any possibility there is a bug in Ritesh Agrawal's query statement from that article? I created a test table with the exact column names from the example in the article and used a minimally altered version of the command (I removed the where clause to simplify things a bit) and got an error which suggests there is something slightly wrong with the command (or perhaps the table has to be configured a special way). Here's what I get when I almost perfectly duplicate that example: hive> describe test; OK user string category string value int Time taken: 0.082 seconds ================================================== hive> select * from test; OK user1 cat1 1 user1 cat1 2 user1 cat1 3 user1 cat2 10 user1 cat2 20 user1 cat2 30 user2 cat1 11 user2 cat1 21 user2 cat1 31 user2 cat2 5 user2 cat2 6 user2 cat2 7 Time taken: 0.202 seconds ================================================== hive> SELECT user, category, value > FROM ( > SELECT user, category, rank(user) as rank, value > FROM test > DISTRIBUTE BY user > SORT BY user, value desc > ) a > WHERE rank > 1 > ORDER BY user, rank; FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column reference 'rank': (possible column names are: _col0, _col1, _col2) hive> ________________________________________________________________________________ Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com "What I primarily learned in grad school is how much I *don't* know. Consequently, I left grad school with a higher ignorance to knowledge ratio than when I entered." -- Keith Wiley ________________________________________________________________________________