Please see the discussion in this JIRA issue: https://issues.apache.org/jira/browse/HIVE-1994
JVS On Feb 21, 2011, at 10:45 PM, Igor Tatarinov wrote: > I would like to implement the moving average as a UDF (instead of a streaming > reducer). Here is what I am thinking. Please let me know if I am missing > something here: > > SELECT product, date, mavg(product, price, 10) > FROM ( > SELECT * > FROM prices > DISTRIBUTE BY product > SORT BY product, date > ) > > I have to pass the key to mavg() because it has to detect when one product > grouping ends and another starts. > > Unfortunately, mavg will also need to maintain a state (moving sum and > count). That's where I am worried that Hive (Hadoop?) will use a single > instance of my UDF to process concurrent groupings and this idea won't work. > > Is that the main issue? Is there something I can do to fix that? > > Thanks! > igor >