[ 
https://issues.apache.org/jira/browse/HIVE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766635#comment-16766635
 ] 

Gopal V commented on HIVE-21242:
--------------------------------

Java used to use UCS-2, it switched to UTF-16 by default to support 
supplemental characters 

https://docs.oracle.com/javase/8/docs/technotes/guides/intl/overview.html#textrep

{code}
The primitive data type char in the Java programming language is an unsigned 
16-bit integer that can represent a Unicode code point in the range U+0000 to 
U+FFFF, or the code units of UTF-16.
{code}


> Calcite Planner Logging Indicates UTF-16 Encoding
> -------------------------------------------------
>
>                 Key: HIVE-21242
>                 URL: https://issues.apache.org/jira/browse/HIVE-21242
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>    Affects Versions: 4.0.0, 3.2.0
>            Reporter: BELUGA BEHR
>            Priority: Major
>
> I noticed some debug logging from calcite and it is using UTF-16.   I would 
> expect UTF-8.
> {code}
> 2019-02-10T19:08:06,393 DEBUG [7db4d3c5-0f88-49db-88fa-ad6428c23784 main] 
> parse.CalcitePlanner: Plan after decorrelation:
> HiveSortLimit(offset=[0], fetch=[2])
>   HiveProject(_o__c0=[array(3, 2, 1)], _o__c1=[map(1, 2001-01-01, 2, null)], 
> _o__c2=[named_struct(_UTF-16LE'c1', 123456, _UTF-16LE'c2', _UTF-16LE'hello', 
> _UTF-16LE'c3', array(_UTF-16LE'aa', _UTF-16LE'bb', _UTF-16LE'cc'), 
> _UTF-16LE'c4', map(_UTF-16LE'abc', 123, _UTF-16LE'xyz', 456), _UTF-16LE'c5', 
> named_struct(_UTF-16LE'c5_1', _UTF-16LE'bye', _UTF-16LE'c5_2', 88))])
>     HiveTableScan(table=[[default, src]], table:alias=[src])
> {code}
> I'm not sure if this is a calcite internal thing which can be configured or 
> if this only an artifact of the way the logging works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to