mahesh kumar behera created HIVE-26105:
------------------------------------------

             Summary: Show columns shows extra values if column comments 
contains specific Chinese character 
                 Key: HIVE-26105
                 URL: https://issues.apache.org/jira/browse/HIVE-26105
             Project: Hive
          Issue Type: Bug
          Components: Hive, HiveServer2
            Reporter: mahesh kumar behera
            Assignee: mahesh kumar behera


The issue is happening because the UTF code for one of the Chinese character 
contains the binary value of '\r' (CR). Because of this, the Hadoop line reader 
(used by fetch task in Hive) is assuming the value after that character as new 
value and this extra value with junk is getting displayed. The issue is with 
0x540D 名 ... The last value is "D" ..that is 13. While reading the result, 
Hadoop line reader interpreting it as CR ( '\r'). Thus an extra value with Junk 
is coming as output. For show column, we do not need the comments. So while 
writing to the file, only column names should be included.

[https://github.com/apache/hadoop/blob/0fbd96a2449ec49f840d93e1c7d290c5218ef4ea/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L238]

 
{code:java}
create table tbl_test  (fld0 string COMMENT  '期 ' , fld string COMMENT '期末日期', 
fld1 string COMMENT '班次名称', fld2  string COMMENT '排班人数');

show columns from tbl_test;
+--------+
| field  |
+--------+
| fld    |
| fld0   |
| fld1   |
| �      |
| fld2   |
+--------+
5 rows selected (171.809 seconds)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to