Thank you, John. It's not quite clear from the page whether my solution: 1. makes sense 2. works now 3. will work in the future if the issue is resolved/implemented
Could you elaborate? Also, there is no mentioning of UDF object sharing (between mappers) in the current implementation. Is this a problem? do I need to use ThreadLocal or something like that? On Tue, Feb 22, 2011 at 11:42 AM, John Sichi <jsi...@fb.com> wrote: > Please see the discussion in this JIRA issue: > > https://issues.apache.org/jira/browse/HIVE-1994 > > JVS > > On Feb 21, 2011, at 10:45 PM, Igor Tatarinov wrote: > > > I would like to implement the moving average as a UDF (instead of a > streaming reducer). Here is what I am thinking. Please let me know if I am > missing something here: > > > > SELECT product, date, mavg(product, price, 10) > > FROM ( > > SELECT * > > FROM prices > > DISTRIBUTE BY product > > SORT BY product, date > > ) > > > > I have to pass the key to mavg() because it has to detect when one > product grouping ends and another starts. > > > > Unfortunately, mavg will also need to maintain a state (moving sum and > count). That's where I am worried that Hive (Hadoop?) will use a single > instance of my UDF to process concurrent groupings and this idea won't work. > > > > Is that the main issue? Is there something I can do to fix that? > > > > Thanks! > > igor > > > >