Please see the discussion in this JIRA issue:

https://issues.apache.org/jira/browse/HIVE-1994

JVS

On Feb 21, 2011, at 10:45 PM, Igor Tatarinov wrote:

> I would like to implement the moving average as a UDF (instead of a streaming 
> reducer). Here is what I am thinking. Please let me know if I am missing 
> something here:
> 
> SELECT product, date, mavg(product, price, 10)
> FROM (
>   SELECT *
>   FROM prices
>   DISTRIBUTE BY product
>   SORT BY product, date
> )
> 
> I have to pass the key to mavg() because it has to detect when one product 
> grouping ends and another starts.
> 
> Unfortunately, mavg will also need to maintain a state (moving sum and 
> count). That's where I am worried that Hive (Hadoop?) will use a single 
> instance of my UDF to process concurrent groupings and this idea won't work. 
> 
> Is that the main issue? Is there something I can do to fix that?
> 
> Thanks!
> igor
> 

Reply via email to