Hi Guys,

We have an avro backed production table bucketed by a column and it works
fine with the old Hive = 0.7 on chd3u2.

Now we have moved to Hive 0.12 on HDP 2.0 and are getting bucketing issues
with this table.

Below is the create table statement

CREATE TABLE test_bucketing
  PARTITIONED BY (year INT, month INT, day INT, hour INT)
  CLUSTERED BY (akey) INTO 32 BUCKETS
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
    'schema.literal'='{
  "type" : "record",
  "name" : "NgEventETLSimple",
  "namespace" : "com.ngmoco.ngpipes.etl.events",
  "fields" : [ {
    "name" : "akey",
    "type" : "string"
  } ]
}')
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

and this is the error we are seeing

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. Bucket columns akey is not part of
the table columns
([FieldSchema(name:error_error_error_error_error_error_error, type:string,
comment:from deserializer), FieldSchema(name:cannot_determine_schema,
type:string, comment:from deserializer), FieldSchema(name:check,
type:string, comment:from deserializer), FieldSchema(name:schema,
type:string, comment:from deserializer), FieldSchema(name:url, type:string,
comment:from deserializer), FieldSchema(name:and, type:string, comment:from
deserializer), FieldSchema(name:literal, type:string, comment:from
deserializer)]

As you can see the akey is mentioned as a part of the table definition,
however it still complains that akey is not part of the table definition.

I can confirm that independently the avro serde works fine [if we drop the
bucketing clause] and independently bucketing works fine on a test table
[without avro serde]. *However together bucketing for an avro backed table
is giving us issues on Hive 0.12*

Any idea what is going on? Any help/pointers would be useful.

Sagar
PS - The actual table in production has way more columns but I don't think
that is relevant for the issue at hand here.

Reply via email to