Mihaly Szjatinya has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/22049 )

Change subject: WIP IMPALA-10319: Support arbitrary encodings on Text/Sequence 
files
......................................................................

WIP IMPALA-10319: Support arbitrary encodings on Text/Sequence files

As proposed in Jira, this implements decoding and encoding of text
buffers for Impala/Hive text tables. Given a table with
'serialization.encoding' property set, similarly to Hive, Impala should
be able to encode the inserted data into charset specified, consequently
saving it into a text file. The opposite decoding operation should be
performed upon reading data buffers from text files. Both operations
employ boost::locale::conv library.

Since Hive doesn't encode line delimiters, charsets that would have
delimiters stored differently from ASCII are not allowed.

One difference from Hive is that Impala implements
'serialization.encoding' only as a per partition serdeproperty to avoid
confusion of allowing both serde and tbl properties. (See related
IMPALA-13748)

Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
---
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/hdfs-text-table-writer.h
M be/src/exec/scanner-context.h
M be/src/exec/text/hdfs-text-scanner.cc
M be/src/exec/text/hdfs-text-scanner.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch.h
M be/src/util/CMakeLists.txt
A be/src/util/char-codec.cc
A be/src/util/char-codec.h
M common/thrift/CatalogObjects.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A tests/query_test/test_charcodec.py
21 files changed, 898 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/22049/17
--
To view, visit http://gerrit.cloudera.org:8080/22049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Gerrit-Change-Number: 22049
Gerrit-PatchSet: 17
Gerrit-Owner: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>

Reply via email to