When I try using rownum in my Hive QL query, I get: "Invalid column reference rownum". Am I missing something here?
Regards, Karan -----Original Message----- From: David Kulp [mailto:dk...@fiksu.com] Sent: 10 April 2012 20:15 To: user@hive.apache.org Subject: Re: Lag function in Hive New here. Hello all. Could you try a self-join, possibly also restricted to partitions? E.g. SELECT t2.value - t1.value FROM mytable t1, mytable t2 WHERE t1.rownum = t2.rownum+1 AND t1.partition=foo AND t2.partition=bar If your data is clustered by rownum, then this join should, in theory, be relatively fast -- especially if it makes sense to exploit partitions. -d On Apr 10, 2012, at 10:37 AM, <karanveer.si...@barclays.com> <karanveer.si...@barclays.com> wrote: > Makes sense but is not the distribution across nodes for a chunk of records > in that order. > > If Hive cannot help me do this, is there another way I can do this? I tried > generating an identifier using the perl script invoked using Hive but it does > not seem to work fine. While the stand alone script works fine, when the > record is created in hive using std output from perl - I see 2 records for > some of the unique identifiers. I explored the possibility of default data > type changes but that does not solve the problem. > > Regards, > Karan > > > -----Original Message----- > From: Philip Tromans [mailto:philip.j.trom...@gmail.com] > Sent: 10 April 2012 19:48 > To: user@hive.apache.org > Subject: Re: Lag function in Hive > > Hi Karan, > > To the best of my knowledge, there isn't one. It's also unlikely to > happen because it's hard to parallelise in a map-reduce way (it > requires knowing where you are in a result set, and who your > neighbours are and they in turn need to be present on the same node as > you which is difficult to guarantee). > > Cheers, > > Phil. > > On 10 April 2012 14:44, <karanveer.si...@barclays.com> wrote: >> Hi, >> >> Is there something like a 'lag' function in HIVE? The requirement is to >> calculate difference for the same column for every 2 subsequent records. >> >> For example. >> >> Row, Column A, Column B >> 1, 10, 100 >> 2, 20, 200 >> 3, 30, 300 >> >> >> The result that I need should be like: >> >> Row, Column A, Column B, Result >> 1, 10, 100, NULL >> 2, 20, 200, 100 (200-100) >> 3, 30, 300, 100 (300-200) >> >> Rgds, >> Karan >> >> >> >> >> >> This e-mail and any attachments are confidential and intended solely for the >> addressee and may also be privileged or exempt from disclosure under >> applicable law. If you are not the addressee, or have received this e-mail >> in error, please notify the sender immediately, delete it from your system >> and do not copy, disclose or otherwise act upon any part of this e-mail or >> its attachments. >> >> Internet communications are not guaranteed to be secure or virus-free. >> The Barclays Group does not accept responsibility for any loss arising from >> unauthorised access to, or interference with, any Internet communications by >> any third party, or from the transmission of any viruses. Replies to this >> e-mail may be monitored by the Barclays Group for operational or business >> reasons. >> >> Any opinion or other information in this e-mail or its attachments that does >> not relate to the business of the Barclays Group is personal to the sender >> and is not given or endorsed by the Barclays Group. >> >> Barclays Bank PLC.Registered in England and Wales (registered no. 1026167). >> Registered Office: 1 Churchill Place, London, E14 5HP, United Kingdom. >> >> Barclays Bank PLC is authorised and regulated by the Financial Services >> Authority.