Re: Need rank()

Edward Capriolo Tue, 02 Apr 2013 13:15:16 -0700

git clone https://github.com/edwardcapriolo/hive_test.git
cd hive_test/
mvn -Dmaven.test.skip=true install
cd ..
[edward@jackintosh java]$ git clone
https://github.com/edwardcapriolo/hive-rank.git
Cloning into 'hive-rank'...
remote: Counting objects: 74, done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 74 (delta 12), reused 70 (delta 8)
Unpacking objects: 100% (74/74), done.
[edward@jackintosh java]$ cd hive-rank/
[edward@jackintosh hive-rank]$ mvn install -Dmaven.test.skip=true
/usr/java/jdk1.7.0_13
...
[INFO] Installing
/home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar
to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar
[INFO] Installing /home/edward/Documents/java/hive-rank/pom.xml to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.pom
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 2.986s
[INFO] Finished at: Tue Apr 02 16:11:41 EDT 2013
[INFO] Final Memory: 17M/210M
[INFO]
------------------------------------------------------------------------


Then copy:
cp
/home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar
to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar
to your hadoop lib.

add jar <name of jar file>
..... etc etc


On Tue, Apr 2, 2013 at 3:51 PM, Keith Wiley <kwi...@keithwiley.com> wrote:

> Yep, the original article is definitely erroneous in this regard.  I
> figured out that eventually.  I'm not sure how much I can trust that
> resource now.  I may have to look elsewhere.  I agree that Edward's
> description is pretty good, but as I said earlier, I can't actually use his
> code, so I'm trying to cobble a workable solution together from the various
> resources available.  Ritesh's article, despite the error in the Hive
> syntax, is still useful in that it enables one to quickly compile a simple
> rank jar without relying on git, maven, or other project dependencies --
> problems which have plagued me with Edward's approach.  So, if I can use
> Ritesh's method to write a simple rank function, and Edward's accurate
> description of how to construct the query, then I can put all the pieces
> together into a workable solution.
>
> I'll let you know if I get it.
>
> On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote:
>
> > You are getting the error because you are ORDERing BY rank but rank is
> not in the top SELECT
> >
> > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a
> subquery:
> > SELECT ..., rank(user)
> > FROM (SELECT ... DISTRIBUTE BY ... SORT BY)
> >
> > igor
> > decide.com
> >
> >
> > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley <kwi...@keithwiley.com>
> wrote:
> > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> >
> > >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> >
> > Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article?  I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way).  Here's
> what I get when I almost perfectly duplicate that example:
> >
> > hive> describe test;
> > OK
> > user    string
> > category        string
> > value   int
> > Time taken: 0.082 seconds
> > ==================================================
> > hive> select * from test;
> > OK
> > user1   cat1    1
> > user1   cat1    2
> > user1   cat1    3
> > user1   cat2    10
> > user1   cat2    20
> > user1   cat2    30
> > user2   cat1    11
> > user2   cat1    21
> > user2   cat1    31
> > user2   cat2    5
> > user2   cat2    6
> > user2   cat2    7
> > Time taken: 0.202 seconds
> > ==================================================
> > hive> SELECT user, category, value
> >     > FROM (
> >     > SELECT user, category, rank(user) as rank, value
> >     > FROM test
> >     > DISTRIBUTE BY user
> >     > SORT BY user, value desc
> >     > ) a
> >     > WHERE rank > 1
> >     > ORDER BY user, rank;
> > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> > hive>
> >
> >
> ________________________________________________________________________________
> > Keith Wiley     kwi...@keithwiley.com     keithwiley.com
> music.keithwiley.com
> >
> > "What I primarily learned in grad school is how much I *don't* know.
> > Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> > when I entered."
> >                                            --  Keith Wiley
> >
> ________________________________________________________________________________
> >
> >
>
>
>
> ________________________________________________________________________________
> Keith Wiley     kwi...@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
>                                            --  Yoda
>
> ________________________________________________________________________________
>
>

Re: Need rank()

Reply via email to