[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730501#comment-13730501 ]
Alexey Zotov commented on HIVE-3442: ------------------------------------ I found a better approach. We need to specify the following _avro.schema.url_ _hdfs:///some/path/schema.json_ to both _TBLPROPERTIES and _SERDEPROPERTIES_. It works well with NameNodes HA mode. > AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating > external table > --------------------------------------------------------------------------------------- > > Key: HIVE-3442 > URL: https://issues.apache.org/jira/browse/HIVE-3442 > Project: Hive > Issue Type: Bug > Affects Versions: 0.10.0 > Reporter: Zhenxiao Luo > Assignee: Zhenxiao Luo > Fix For: 0.10.0 > > > After creating a table and load data into it, I could check that the table is > created successfully, and data is inside: > DROP TABLE IF EXISTS ml_items; > CREATE TABLE ml_items(id INT, > title STRING, > release_date STRING, > video_release_date STRING, > imdb_url STRING, > unknown_genre TINYINT, > action TINYINT, > adventure TINYINT, > animation TINYINT, > children TINYINT, > comedy TINYINT, > crime TINYINT, > documentary TINYINT, > drama TINYINT, > fantasy TINYINT, > film_noir TINYINT, > horror TINYINT, > musical TINYINT, > mystery TINYINT, > romance TINYINT, > sci_fi TINYINT, > thriller TINYINT, > war TINYINT, > western TINYINT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; > select * from ml_items ORDER BY id ASC; > While, the following create external table with AvroSerDe is not working: > DROP TABLE IF EXISTS ml_items_as_avro; > CREATE EXTERNAL TABLE ml_items_as_avro > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > WITH SERDEPROPERTIES ( > 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') > STORED as INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; > describe ml_items_as_avro; > INSERT OVERWRITE TABLE ml_items_as_avro > SELECT id, title, > imdb_url, unknown_genre, action, adventure, animation, children, comedy, > crime, > documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, > sci_fi, thriller, war, western > FROM ml_items; > ml_items_as_avro is not created with expected schema, as shown in the > "describe ml_items_as_avro" output. The output is below: > PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro > PREHOOK: type: DROPTABLE > POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro > POSTHOOK: type: DROPTABLE > PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > WITH SERDEPROPERTIES ( > 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') > STORED as INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' > PREHOOK: type: CREATETABLE > POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > WITH SERDEPROPERTIES ( > 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') > STORED as INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' > POSTHOOK: type: CREATETABLE > POSTHOOK: Output: default@ml_items_as_avro > PREHOOK: query: describe ml_items_as_avro > PREHOOK: type: DESCTABLE > POSTHOOK: query: describe ml_items_as_avro > POSTHOOK: type: DESCTABLE > error_error_error_error_error_error_error string from deserializer > cannot_determine_schema string from deserializer > check string from deserializer > schema string from deserializer > url string from deserializer > and string from deserializer > literal string from deserializer > FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target > table because column number/types are different 'ml_items_as_avro': Table > insclause-0 has 7 columns, but query has 22 columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira