Do you mean that my UDF would store the timestamp of the current row in a 
static field in the UDF's implementation, and when processing the next row, use 
that field to get the previous row's value ?

Can anyone comment on whether that's safe, re: I'm not familiar with Hive 
internals ?

Thanks,

pk

Sent from my iPhone

On Jun 10, 2011, at 11:18 PM, Tim Spence <yogi.wan.ken...@gmail.com> wrote:

> Praveen,
> This would be best accomplished with a UDF because Hive does not support 
> cursors.
> Best of luck,
> Tim
> 
> 
> 
> 
> On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar <pk1u...@gmail.com> wrote:
> If I have table timestamps:
> 
> hive> desc timestamps;
> 
> OK
> ts      bigint
> 
> 
> hive> select ts from timestamps order by ts
> OK
> 
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 30
> 32
> 34
> 36
> 38
> 40
> 42
> 44
> 46
> 48
> 50
> 70
> 74
> 78
> 100
> 105
> 110
> 115
> 
> and I want to make groups of the values where splits between groups
> occur where two time-consecutive entries have a difference greater
> than 10.
> 
> Eg, above, the splits would be such that the numbers would be grouped
> into these ranges:
> 
> 0-10
> 30-50
> 70-78
> 100-115
> 
> because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.
> 
> I'd like the query to result in the following:
> 
> hive> select ...
> 
> 0       7
> 0       9
> 0       6
> 0       3
> 0       10
> 0       1
> 0       4
> 0       5
> 0       8
> 0       2
> 30      34
> 30      44
> 30      40
> 30      38
> 30      36
> 30      32
> 30      46
> 30      42
> 30      48
> 30      50
> 30      30
> 70      74
> 70      70
> 70      78
> 100     100
> 100     105
> 100     110
> 100     115
> 
> What is the most efficient hive query that will do this ? Thanks,
> 
> pk
> 

Reply via email to