git clone https://github.com/edwardcapriolo/hive_test.git cd hive_test/ mvn -Dmaven.test.skip=true install cd .. [edward@jackintosh java]$ git clone https://github.com/edwardcapriolo/hive-rank.git Cloning into 'hive-rank'... remote: Counting objects: 74, done. remote: Compressing objects: 100% (35/35), done. remote: Total 74 (delta 12), reused 70 (delta 8) Unpacking objects: 100% (74/74), done. [edward@jackintosh java]$ cd hive-rank/ [edward@jackintosh hive-rank]$ mvn install -Dmaven.test.skip=true /usr/java/jdk1.7.0_13 ... [INFO] Installing /home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar to /home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar [INFO] Installing /home/edward/Documents/java/hive-rank/pom.xml to /home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.pom [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2.986s [INFO] Finished at: Tue Apr 02 16:11:41 EDT 2013 [INFO] Final Memory: 17M/210M [INFO] ------------------------------------------------------------------------
Then copy: cp /home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar to /home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar to your hadoop lib. add jar <name of jar file> ..... etc etc On Tue, Apr 2, 2013 at 3:51 PM, Keith Wiley <kwi...@keithwiley.com> wrote: > Yep, the original article is definitely erroneous in this regard. I > figured out that eventually. I'm not sure how much I can trust that > resource now. I may have to look elsewhere. I agree that Edward's > description is pretty good, but as I said earlier, I can't actually use his > code, so I'm trying to cobble a workable solution together from the various > resources available. Ritesh's article, despite the error in the Hive > syntax, is still useful in that it enables one to quickly compile a simple > rank jar without relying on git, maven, or other project dependencies -- > problems which have plagued me with Edward's approach. So, if I can use > Ritesh's method to write a simple rank function, and Edward's accurate > description of how to construct the query, then I can put all the pieces > together into a workable solution. > > I'll let you know if I get it. > > On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote: > > > You are getting the error because you are ORDERing BY rank but rank is > not in the top SELECT > > > > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a > subquery: > > SELECT ..., rank(user) > > FROM (SELECT ... DISTRIBUTE BY ... SORT BY) > > > > igor > > decide.com > > > > > > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley <kwi...@keithwiley.com> > wrote: > > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > > > Is there any possibility there is a bug in Ritesh Agrawal's query > statement from that article? I created a test table with the exact column > names from the example in the article and used a minimally altered version > of the command (I removed the where clause to simplify things a bit) and > got an error which suggests there is something slightly wrong with the > command (or perhaps the table has to be configured a special way). Here's > what I get when I almost perfectly duplicate that example: > > > > hive> describe test; > > OK > > user string > > category string > > value int > > Time taken: 0.082 seconds > > ================================================== > > hive> select * from test; > > OK > > user1 cat1 1 > > user1 cat1 2 > > user1 cat1 3 > > user1 cat2 10 > > user1 cat2 20 > > user1 cat2 30 > > user2 cat1 11 > > user2 cat1 21 > > user2 cat1 31 > > user2 cat2 5 > > user2 cat2 6 > > user2 cat2 7 > > Time taken: 0.202 seconds > > ================================================== > > hive> SELECT user, category, value > > > FROM ( > > > SELECT user, category, rank(user) as rank, value > > > FROM test > > > DISTRIBUTE BY user > > > SORT BY user, value desc > > > ) a > > > WHERE rank > 1 > > > ORDER BY user, rank; > > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or > column reference 'rank': (possible column names are: _col0, _col1, _col2) > > hive> > > > > > ________________________________________________________________________________ > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > > > "What I primarily learned in grad school is how much I *don't* know. > > Consequently, I left grad school with a higher ignorance to knowledge > ratio than > > when I entered." > > -- Keith Wiley > > > ________________________________________________________________________________ > > > > > > > > ________________________________________________________________________________ > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "Luminous beings are we, not this crude matter." > -- Yoda > > ________________________________________________________________________________ > >