Hi,
Hive 0.9.0 + Elephant-Bird 3.0.7
I faced a problem to use the elephant-bird with hive. I know what maybe cause
this problem, but I don't know which side this bug belongs to. Let me know
explain what is the problem.
If we define a google protobuf file, with field name like 'dateString' (the
field contains an uppercase 'S'), then when I query the table like this:
select dateString from table .............
I will get the following exception trace:
Caused by:
java.lang.RuntimeException: cannot find field datestring from
[org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@49aacd5f
..................... at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
at
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:96)
at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:73)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
Here is the code for the method throws this error:
public static StructField getStandardStructFieldRef(String fieldName,
List<? extends StructField> fields) { fieldName = fieldName.toLowerCase();
for (int i = 0; i < fields.size(); i++) { if
(fields.get(i).getFieldName().equals(fieldName)) { return fields.get(i);
} } // For backward compatibility: fieldNames can also be integer
Strings. try { int i = Integer.parseInt(fieldName); if (i >= 0 &&
i < fields.size()) { return fields.get(i); } } catch
(NumberFormatException e) { // ignore } throw new
RuntimeException("cannot find field " + fieldName + " from " + fields);
// return null; }
I understand the problem happens because at this time, the fileName is
"datestring" (all lowercase charcters), but the List<fields> contains the
fieldName for that field is "dateString", and that is why the RuntimeException
happened.
But I don't know which side this bug belongs to, or I want to know more inside
detail about the Hive implementation contract.
>From this link:
>https://cwiki.apache.org/Hive/user-faq.html#UserFAQ-AreHiveQLidentifiers%2528e.g.tablenames%252Ccolumnnames%252Cetc%2529casesensitive%253F
I know that in hive, the table name and column name should be case insensitive,
so even though in my Query, I used "select dateString", the fieldName changed
to "datestring" in the code, but the StructField of ObjectInspector from the
elephant-bird return the EXACTLY fieldname, defined in the code, "dateString"
in this case. of course, I can change my protof file to only use lowercase
field name to bypass this bug, but my questions are:
1) If I implement my ObjectInspector, should I pay attention to the field name?
Is it needed to be lowercase? 2) I would consider this as a bug of hive, right?
If this line:
fieldName = fieldName.toLowerCase(); to lowercase the data,
then the comparing should also do it by lowering case by changing
if (fields.get(i).getFieldName().equals(fieldName))
to
if (fields.get(i).getFieldName().toLowerCase().equals(fieldName))
right?
Thanks
Yong