[ https://issues.apache.org/jira/browse/HIVE-28728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paramvir Singh reassigned HIVE-28728: ------------------------------------- Assignee: (was: Paramvir Singh) > In INSERT OVERWRITE queries, STR_TO_MAP() UDF is not using UTF-8 encoding > properly resulting in garbled characters > ------------------------------------------------------------------------------------------------------------------ > > Key: HIVE-28728 > URL: https://issues.apache.org/jira/browse/HIVE-28728 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 4.0.0, 4.0.1 > Reporter: Paramvir Singh > Priority: Major > > Chinese characters turn to garbled characters on using INSERT OVERWRITE query > and using STR_TO_MAP() function > Repro steps: > 1. Text data file > {code:java} > 100 hive > 200 spark > 300 oozie > 400 airflow > 500 优惠活动 > {code} > {{2. Create table on top of it}} > {code:java} > CREATE external TABLE t1( > id string, > name string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ' ' > STORED AS TEXTFILE > LOCATION 's3://prmsingh-hive/garbled/rawdata/'; > 3. Selecting the data from source table runs fine > {code:java} > select STR_TO_MAP(concat(id,":",name),',',':') from t1; > OK > {"100":"hive"} > {"200":"spark"} > {"300":"oozie"} > {"400":"airflow"} > {"500":"优惠活动"} > {code} > 4. But when you create another table and run IOW query to insert the data and > use select query on the destination table, it returns garbled characters > {code:java} > create external table result3 > (cd MAP<STRING, STRING>) > location 's3://prmsingh-hive/garbled/result3/'; > insert overwrite table result3 select STR_TO_MAP(concat(id,":",name),',',':') > from t1; > hive> select * from result3; > OK > {"100":"hive"} > {"200":"spark"} > {"300":"oozie"} > {"400":"airflow"} > {"500":"????"} > {code} > > But when I create the table and insert the data while vectorization is > disabled, then the result is fine -- This message was sent by Atlassian Jira (v8.20.10#820010)