[ 
https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7142:
--------------------------------

    Description: 
Currently Hive only support serialize data into UTF-8 charset bytes or 
deserialize from UTF-8 bytes, real world users may want to load different kinds 
of encoded data into hive directly. This jira is dedicated to support 
serialize/deserialize all kinds of encoded data in SerDe layer. 

For user, only need to configure serialization encoding on table level by set 
serialization encoding through serde parameter, for example:

{code:sql}
CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES("serialization.encoding"='GBK');
{code}

or

{code:sql}
ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
{code}

LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in 
this patch.

  was:
Hive only support serialize/deserialize in UTF-8, real world users want to load 
different kinds of encoded data into hive directly. For many PRC customers, 
they would like to load GBK encoded data.
We support config serialization encoding on table level by set serialization 
encoding through serde parameter, for example:
{noformat}
alter table test set serdeproperties ('serialization.encoding'='GBK'); 
{noformat}

LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in 
this patch.


> Hive multi serialization encoding support
> -----------------------------------------
>
>                 Key: HIVE-7142
>                 URL: https://issues.apache.org/jira/browse/HIVE-7142
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>         Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch
>
>
> Currently Hive only support serialize data into UTF-8 charset bytes or 
> deserialize from UTF-8 bytes, real world users may want to load different 
> kinds of encoded data into hive directly. This jira is dedicated to support 
> serialize/deserialize all kinds of encoded data in SerDe layer. 
> For user, only need to configure serialization encoding on table level by set 
> serialization encoding through serde parameter, for example:
> {code:sql}
> CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
> SERDEPROPERTIES("serialization.encoding"='GBK');
> {code}
> or
> {code:sql}
> ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
> {code}
> LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property 
> in this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to