Mihaly Szjatinya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22049 )

Change subject: WIP IMPALA-10319: Support arbitrary encodings on Text/Sequence 
files
......................................................................


Patch Set 5:

(5 comments)

1. Fixed some of the comments.
2. First draft text encoding + encoding-decoding tests.
3. Restructured tests so far.

http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/exec/text/hdfs-text-scanner.cc
File be/src/exec/text/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/exec/text/hdfs-text-scanner.cc@545
PS4, Line 545: DecodeBuffer
> Is it possible for the end of buffer to cut a multi-line character in two?
Good point, although I'm not sure HdfsTextScanner doesn't handle this on a 
different level. Yet there is a special case for line.delim being two bytes: 
\r\n.

I'll try to fix this in the next patch.


http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/util/decoder.cc
File be/src/util/decoder.cc:

http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/util/decoder.cc@45
PS4, Line 45:
> hdfs-text-scanner.cc doesn't create a decoder if encoding is UTF-8, so I th
Removed.


http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/util/decoder.cc@51
PS4, Line 51:
> Based on https://beta.boost.org/doc/libs/1_85_0/libs/locale/doc/html/group_
Agree, done.


http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/util/decoder.cc@52
PS4, Line 52:
> There is also a "method_type how" optional parameter that decides what to d
Good point.


http://gerrit.cloudera.org:8080/#/c/22049/4/fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
File fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/22049/4/fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java@233
PS4, Line 233:     // Extract encoding
             :     String encoding = DEFAULT_ENCODING;
             :     String encodingValue = 
parameters.get(serdeConstants.SERIALIZATION_ENCODING);
             :     if (encodingValue != null) {
             :       encoding = encodingValue;
             :     }
> I saw examples for both among the tests

I guess those were statements from HIVE_QUERY section, which support both. 
Updated to use only serdeproperties.

> This decision should be also mentioned in the commit message,

Agree, done.



--
To view, visit http://gerrit.cloudera.org:8080/22049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Gerrit-Change-Number: 22049
Gerrit-PatchSet: 5
Gerrit-Owner: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Comment-Date: Mon, 23 Dec 2024 00:08:55 +0000
Gerrit-HasComments: Yes

Reply via email to