Another email thread led me to HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank operator is case-sensitive and has odd semantics") -- it's resolved as invalid, but is that only for the odd semantics?
Perhaps this issue is clarified in more recent emails. I'm catching up on a huge backlog. -- Lefty On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <leftylever...@gmail.com>wrote: > What's the answer -- does the "rank" keyword have to be lowercase? > > If lowercase is obligatory we need to revise the wiki, which shows all > uppercase ( > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics > ). > > In the test files it's lowercase (windowing_rank.q, > ptf_negative_WhereWithRankCond.q). > The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but > I don't know if that means lowercase is required: > > @WindowFunctionDescription >> >> ( >> >> description = @Description( >> >> name = "rank", >> >> value = "_FUNC_(x)" >> >> ), >> >> supportsWindow = false, >> >> pivotResult = true >> >> ) >> > > > And what about the other keywords in the wikidoc? Same lowercase > requirement? > > -- Lefty > > > On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mpp.databa...@gmail.com> wrote: > >> Hi all, >> >> Below are some of observations based on the on-going rank function >> discussion. >> >> 1. I executed below mentioned queries and only the query with "rank" >> (lowercase) executed successfully, rest were throwing exceptions "FAILED: >> SemanticException Failed to breakup Windowing invocations into Groups." >> >> - select cust_id, ord_dt, RANK() w from cust_ord window w as (partition >> by cust_id order by ord_dt); >> >> - select cust_id, ord_dt, Rank() w from cust_ord window w as (partition >> by cust_id order by ord_dt); >> >> - select cust_id, ord_dt, rank() w from cust_ord window w as (partition >> by cust_id order by ord_dt); >> >> It seems "rank" keyword is case-sensitive. Attached is the screenshot >> for reference. >> >> 2. I created a dummy table with the data provided in the below mail trail >> and achieved the expected output, using the below mentioned query. >> >> *select cust_id, ord_dt, rank() over (partition by cust_id order by >> ord_dt) from cust_ord;* >> >> Request all to kindly review these details and suggest if it was of any >> help! >> >> Thanks. >> >> >> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser < >> j.barrett.straus...@gmail.com> wrote: >> >>> Any further help on this, otherwise I'll file a jira. >>> >>> >>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser < >>> j.barrett.straus...@gmail.com> wrote: >>> >>>> As an example : If I run my query above removing the arg the following >>>> is thrown. >>>> >>>> FAILED: SemanticException Failed to breakup Windowing invocations into >>>> Groups. At least 1 group must only depend on input columns. Also check for >>>> circular dependencies. >>>> Underlying error: >>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more >>>> arguments are expected. >>>> >>>> >>>> Similar issue and fix here: >>>> >>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html >>>> >>>> Even if it didn't require an arg it still doesn't explain my anomalous >>>> output. >>>> >>>> >>>> >>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser < >>>> j.barrett.straus...@gmail.com> wrote: >>>> >>>>> That isn't true. If you try to run the above HIVE without an argument, >>>>> it will throw an exception. I have seen other users replicate this problem >>>>> as well. >>>>> >>>>> I can file a JIRA if someone can confirm that my query should work. >>>>> >>>>> >>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbh...@rocketmail.com < >>>>> manishbh...@rocketmail.com> wrote: >>>>> >>>>>> Analytical function doesn't expect any argument. Rank() itself enough >>>>>> to sequence based on the window you have defined in partition by. So >>>>>> >>>>>> Rank() over (partition by cmscustid order by orderdate) >>>>>> >>>>>> Should work as long as I have wrote right syntax for hive. >>>>>> >>>>>> Sent via Rocket from my HTC >>>>>> >>>>>> ----- Reply message ----- >>>>>> From: "j.barrett Strausser" <j.barrett.straus...@gmail.com> >>>>>> To: <user@hive.apache.org> >>>>>> Subject: Semantics of Rank. >>>>>> Date: Thu, Jul 25, 2013 1:08 AM >>>>>> >>>>>> >>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation >>>>>> between >>>>>> rank and the windowing function is wrong. >>>>>> >>>>>> What I want to achieve for the following is : For a given customer id, >>>>>> sort his orders. I thought the below would work. >>>>>> >>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data >>>>>> eh >>>>>> window w as (partition by cmscustid order by orderdate); >>>>>> >>>>>> The rank function instead returns the rank of the order date over all >>>>>> all >>>>>> order dates. >>>>>> >>>>>> Example snippet from above >>>>>> >>>>>> Actual : >>>>>> >>>>>> 6758783 27APR2012 94 >>>>>> 6758783 23JUN2012 95 >>>>>> 6758785 14DEC2012 96 >>>>>> 6758795 18DEC2011 97 >>>>>> 6758796 06MAY2012 98 >>>>>> 6758798 24MAR2013 99 >>>>>> 6758799 23NOV2012 100 >>>>>> >>>>>> >>>>>> Expected : >>>>>> >>>>>> 6758783 27APR2012 1 >>>>>> 6758783 23JUN2012 2 >>>>>> 6758785 14DEC2012 1 >>>>>> 6758795 18DEC2011 1 >>>>>> 6758796 06MAY2012 1 >>>>>> 6758798 24MAR2013 1 >>>>>> 6758799 23NOV2012 1 >>>>>> >>>>>> >>>>>> -b >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman < >>>>>> sglix...@outbrain.com>wrote: >>>>>> >>>>>> > the argument to rank is simply some value, whereas the rank function >>>>>> > compare this value >>>>>> > to the previous value received, if value is same, rank returns >>>>>> ++index, >>>>>> > otherwise, rank return 1. >>>>>> > pseudo code: >>>>>> > >>>>>> > class Rank { >>>>>> > int index; >>>>>> > Object previousValue = null; >>>>>> > int evaluate(Object value) { >>>>>> > if (value == previousValue) { >>>>>> > return ++index; >>>>>> > } >>>>>> > previousValue = value; >>>>>> > index = 1; >>>>>> > return 1; >>>>>> > } >>>>>> > >>>>>> > >>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser < >>>>>> > j.barrett.straus...@gmail.com> wrote: >>>>>> > >>>>>> >> It seems as though I am required to pass in an argument to RANK(). >>>>>> >> >>>>>> >> What is the effect of passing this argument in ? >>>>>> >> >>>>>> >> In a RANK function the output for RANK should be the number of rows >>>>>> >> preceding >>>>>> >> a row in a given window. As windows are specified by the >>>>>> partition and >>>>>> >> order by fields I don't understand the effect of passing in an arg >>>>>> to RANK >>>>>> >> >>>>>> >> Are the other non-aggregate function similar? Meaning : Cume_dist, >>>>>> Ntile >>>>>> >> -- >>>>>> >> >>>>>> >> >>>>>> >> https://github.com/bearrito >>>>>> >> @deepbearrito >>>>>> >> >>>>>> > >>>>>> > >>>>>> > The above terms reflect a potential business arrangement, are >>>>>> provided solely >>>>>> > as a basis for further discussion, and are not intended to be and >>>>>> do not >>>>>> > constitute a legally binding obligation. No legally binding >>>>>> obligations will >>>>>> > be created, implied, or inferred until an agreement in final form >>>>>> is executed >>>>>> > in writing by all parties involved. >>>>>> > >>>>>> > This email and any attachments hereto may be confidential or >>>>>> privileged. >>>>>> > If you received this communication by mistake, please don't >>>>>> forward it >>>>>> > to anyone else, please erase all copies and attachments, and please >>>>>> let >>>>>> > me know that it has gone to the wrong person. Thanks. >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> https://github.com/bearrito >>>>>> @deepbearrito >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> https://github.com/bearrito >>>>> @deepbearrito >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> https://github.com/bearrito >>>> @deepbearrito >>>> >>> >>> >>> >>> -- >>> >>> >>> https://github.com/bearrito >>> @deepbearrito >>> >> >> >