[ https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chengxiang Li updated HIVE-7142: -------------------------------- Description: Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: {code:sql} CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='GBK'); {code} or {code:sql} ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); {code} LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch. was: Hive only support serialize/deserialize in UTF-8, real world users want to load different kinds of encoded data into hive directly. For many PRC customers, they would like to load GBK encoded data. We support config serialization encoding on table level by set serialization encoding through serde parameter, for example: {noformat} alter table test set serdeproperties ('serialization.encoding'='GBK'); {noformat} LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch. > Hive multi serialization encoding support > ----------------------------------------- > > Key: HIVE-7142 > URL: https://issues.apache.org/jira/browse/HIVE-7142 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Reporter: Chengxiang Li > Assignee: Chengxiang Li > Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch > > > Currently Hive only support serialize data into UTF-8 charset bytes or > deserialize from UTF-8 bytes, real world users may want to load different > kinds of encoded data into hive directly. This jira is dedicated to support > serialize/deserialize all kinds of encoded data in SerDe layer. > For user, only need to configure serialization encoding on table level by set > serialization encoding through serde parameter, for example: > {code:sql} > CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH > SERDEPROPERTIES("serialization.encoding"='GBK'); > {code} > or > {code:sql} > ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); > {code} > LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property > in this patch. -- This message was sent by Atlassian JIRA (v6.2#6252)