Rohan, You could do it one of the following ways: 1) Write a UDAF that does the avg(f2 - avg_f2) computation. 2) Write a custom reducer that does the avg(f2 - avg_f2) computation. 3) Do it with multiple passes over the data. Something like this (untested):
select table.f1, avg_table.avg_f2, avg(table.f2-avg_table.avg_f2) from ( select f1, avg(f2) as avg_f2 from table group by f1)avg_table join table ON (avg_table.f1=table.f1) group by table.f1, avg_table.avg_f2; Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. ----- Original Message ----- From: "rohan monga" <monga.ro...@gmail.com> To: user@hive.apache.org Sent: Friday, January 20, 2012 6:00:54 PM Subject: Re: Invoke a UDAF inside another UDAF my bad, i hastily converted the query to a wrong example. it should be like this select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1; In essence, I just want to use the value generated by one UDAF ( in this case avg(f2) ) as a single number and then apply that value to the group inside a different UDAF. For e.g. if I were to use a streaming reducer, it would be something like this avg1 = computeSum(list) / len(list) return computeSum(x-avg1 for x in list) / len(list) As I write this I realize why this might not be possible [ the group computation being done in one step and the information being lost ] :) But why the nullpointer exception? Regards, -- Rohan Monga On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo < edlinuxg...@gmail.com > wrote: IMHO You can not possibly nest the percentile calculation because the results would be meaningless. percentile has to aggregate a set and pick the Nth element, But if you nest then the inner percentile only returns one result to the outer percentile, and that is pretty meaningless. (I think someone talked about this on list in the last month or so). Without seeing your input data and your expected results, i can not understand what your query wants to do, and suggest an alternative. On 1/20/12, rohan monga < monga.ro...@gmail.com > wrote: > thanks edward that seems to work :) > > However, I have another query is like this > > select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx( b, > .5), .5 ) from table1 group by a > > Here I will loose the group info if I include the inner query in the FROM > clause, is there a way to get this to work? > > Thanks, > -- > Rohan Monga > > > On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo > < edlinuxg...@gmail.com >wrote: > >> I think if you are grouping by b, b has to be in your select list. Try >> this. >> FROM ( >> select b,count(a) as theCount from table one group by b >> ) a select mean(theCount); >> >> I think that should work. >> >> On 1/20/12, rohan monga < monga.ro...@gmail.com > wrote: >> > Hi, >> > I am trying to run a query like >> > "select mean(count(a)) from table1 group by b;" >> > >> > I am getting the following error >> > <snip> >> > FAILED: Hive Internal Error: java.lang.NullPointerException(null) >> > java.lang.NullPointerException >> > at >> > >> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777) >> >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) >> >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) >> >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125) >> >> > at >> > >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223) >> >> > at >> > >> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240) >> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428) >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) >> > at >> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) >> > at >> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) >> > at >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) >> > at >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) >> > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635) >> > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) >> > </snip> >> > >> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8 >> > >> > Thanks >> > -- >> > Rohan Monga >> > >> >