Qing Miao created HIVE-22734:
--------------------------------

             Summary: orc multi-byte character varchar type stored in some 
truncation
                 Key: HIVE-22734
                 URL: https://issues.apache.org/jira/browse/HIVE-22734
             Project: Hive
          Issue Type: Improvement
          Components: Database/Schema
    Affects Versions: 2.3.6
         Environment: unbuntu and centos7 

 
            Reporter: Qing Miao


hi , I 'm a noob new one ...

but I use hive for some years , 

 

I create a table with one column  as varhcar(6) with orc 

an insert a multi-byte content in the table as below 

 

 

hive> insert into mq1 values ('一二三四五六七') ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = mq5445_20200116144748_cb87f769-9d3f-4b3b-b384-92c22b8ef06a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2020-01-16 14:47:52,024 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local484725283_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory 
hdfs://wsl:9000/user/hive/warehouse/mq1/.hive-staging_hive_2020-01-16_14-47-48_936_2091348056955954494-1/-ext-10000
Loading data to table default.mq1
MapReduce Jobs Launched: 
Stage-Stage-1: HDFS Read: 524 HDFS Write: 315 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 5.467 seconds
hive> select * from mq1 ;
OK
一二
一二
Time taken: 0.301 seconds, Fetched: 2 row(s)
hive> show create table mq1 ;
OK
CREATE TABLE `mq1`(
 `col1` varchar(6))
ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
 'hdfs://wsl:9000/user/hive/warehouse/mq1'
TBLPROPERTIES (
 'transient_lastDdlTime'='1579157273')
Time taken: 0.281 seconds, Fetched: 12 row(s)

 

It seems cannot store as six multi-byte word as mysql , for chinese in utf8 , 
it stored only 2 word for 3byte each in utf8 .

And in hive other format , for example , text format , parquet work well in 
this situation .

My hive version is 2.3.6/2.2.0 for hadoop 2.7.0 ,orc cannot work well . 

It seems that orc project fix some in version 1.6.2 and I just change the 
orc-core-1.6.2.jar in the hive lib. 

It does not work well either .

 

  

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to