Hi,
I've hit problems when writing custom UDTF that should return string
values. I couldn't find anywhere what type should have the values that
get forward()ed to collector. The only info I could dig out from
google was few blogs with examples and 4 UDTFs that are among the hive
sources. From that I figured out, that it should be OK to simply pass
Strings inside the forwarded Object[] array. Here are the relevant
parts of my code:
private Object[] forwardListObj;
@Override
public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {
// snipped irrelevant code
forwardListObj = new Object[1];
forwardListObj[0] = new String();
ArrayList<String> fieldNames = new ArrayList<String>(1);
ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(1);
fieldNames.add("section");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
return
ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);
}
In proces() there is simple forwarding of some String:
forwardListObj[0] = "";
forward(forwardListObj);
// OR
String s = ...
forwardListObj[0] = s;
forward(forwardListObj);
I was testing the function with a simple query
SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);
and it worked just as intended. But at the moment I got from testing
to actually using the function in more complex queries, I got into
trouble. Even LATERAL VIEW statement can cause failures:
SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);
causes tasks to fail with exception
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.hadoop.io.Text
at
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
at
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
at
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
at
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
at
org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
at
org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
at
cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
...
I should also mention that I use custom SerDe and InputFormat for the
'logs' table. When I was trying to figure it out, I was trying to run
the same queries as listed above on different table without the
customizations and it worked correctly too. So I think the SerDe
and/or InputFormat probably play some role in this as well. What I
don't understand is why the problem exhibits itself only with LATERAL
VIEW. Any ideas anyone? Also, is it really correct to send String in
forward()?
Best regards,
Jan