On 11/11/2014 01:07 PM, Jean-Pascal Billaud wrote:
While running "select * from parquet_requests", the whole thing crashes
with the
following exception:

   > public ArrayWritableGroupConverter(final GroupType groupType, final
HiveGroupConverter parent,
   >    final int index) {
   >   this.parent = parent;
   >   this.index = index;
   >   int count = groupType.getFieldCount();
   >   if (count < 1 || count > 2) {
   >     throw new IllegalStateException("Field count must be either 1 or 2:
" + count);
   >   }
   >

What this means is that requests_tuple is not considered a valid list
because
it has more than one field. It basically expects the "repeated" keyword on
the
"requests (LIST)" as opposed to "requests_tuple". The actual code also does
not
seem to handle repeated on primitives since the ETypeConverters always call
parent.set() hence always replacing the previous stored instance.

I cooked up a patch which as far as I can tell would fix the issues here and
I would like to have some comments to see if that patch is in the right
direction
before submitting a more formal pull request. Things need to be polished so
please don't spend too much time on the form but more on the approach.

https://github.com/jpbillaud/hive/commit/4c1de69b0c484903d663b920c1bfbdf8cd9b920d

Moreover, I have a feeling that I should probably not pass the thrift class
for
the parquet table given that at this point it is totally irrelevant and the
parquet
schema is stored in the parquet files. I also expect some ObjectInspector
issue
due to the extra grouping provided by the requests_tuple entry. Thoughts?

Thanks,


Hi Jean-Pascal,

This is a known issue that we're going to be fixing shortly. The problem is that there's a difference in the way Hive and Thrift (or Avro) represents lists. PARQUET-113 [1] is an effort to define what is currently being written and what we need to do to add the compatibility. It also specifies what should be written.

Hive is one of the first object models that will be updated with the backward-compatibility rules so that it can read parquet-avro and parquet-thrift structures correctly.

rb

[1]: https://issues.apache.org/jira/browse/PARQUET-113

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to