Aihua Xu created HIVE-11129: ------------------------------- Summary: Issue a warning when copied from UTF-8 to ISO 8859-1 Key: HIVE-11129 URL: https://issues.apache.org/jira/browse/HIVE-11129 Project: Hive Issue Type: Bug Components: File Formats Reporter: Aihua Xu
Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning. {noformat} CREATE TABLE person_utf8 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='UTF8'); {noformat} Put the following data in the table: Müller,Thomas Jørgensen,Jørgen Vega,Andrés 中村,浩人 אביה,נועם {noformat} CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1') AS select * from person_utf8; {noformat} expected to get mangled data but we should give a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)