I have some avro data files partitioned by date such as:

/items/yyyy/mm/dd/part-r-00000.avro

I want to create a Hive table on this data, so I have a create table
statement:

CREATE EXTERNAL TABLE items
PARTITIONED BY (year STRING, month STRING, day STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/items'
TBLPROPERTIES ( 'avro.schema.url' = 'file://${schemabase}/Item.avsc' );

The item schema is:

{
  "type" : "record",
  "name" : "Item",
  "fields" : [ {
    "name" : "year",
    "type" : "string"
  }, {
    "name" : "month",
    "type" : "string"
  }, {
    "name" : "day",
    "type" : "string"
  }, {
    "name" : "hour",
    "type" : "string"
  }, {
    "name" : "dt",
    "type" : "long"
  }, {
    "name" : "user_agent",
    "type" : "string"
  }, {
    "name" : "ip",
    "type" : "string"
  } ]
}

The problem is that I get an error creating the table:

FAILED: Error in metadata:
org.apache.hadoop.hive.ql.metadata.HiveException: Partition column name
year conflicts with table columns.
14/01/27 03:08:12 ERROR exec.Task: FAILED: Error in metadata:
org.apache.hadoop.hive.ql.metadata.HiveException: Partition column name
year conflicts with table columns.
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hive.ql.metadata.HiveException: Partition column name
year conflicts with table columns.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:582)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3719)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:254)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:445)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:455)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:713)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Partition
column name year conflicts with table columns.
at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:213)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:560)
... 21 more


So, how do I tell Hive to use a column in the Avro schema for partitioning?

Thanks in advance,

George

Reply via email to