Mihaly Szjatinya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22049 )

Change subject: WIP IMPALA-10319: Support arbitrary encodings on Text/Sequence 
files
......................................................................


Patch Set 7:

(2 comments)

Changes:
1. Fixed split symbols by storing partial symbol.
2. Added self generating tests for arbitrarily large volumes.
3. Improved 'alter table' analysis to check for current line.delim instead of 
just '\n'.
4. Changed encodingValue from required to optional
5. Bugfixing.

http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/exec/text/hdfs-text-scanner.cc
File be/src/exec/text/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/22049/4/be/src/exec/text/hdfs-text-scanner.cc@545
PS4, Line 545: r) {
> Good point, although I'm not sure HdfsTextScanner doesn't handle this on a 
Implemented the 1st option for this. Applied heuristic to find split symbol at 
the beginning and at the end of the buffer, to avoid copying.


http://gerrit.cloudera.org:8080/#/c/22049/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/22049/6/common/thrift/CatalogObjects.thrift@359
PS6, Line 359:   9: optional string encodingValue
> Please use 'optional' instead of 'required'
Ack



--
To view, visit http://gerrit.cloudera.org:8080/22049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Gerrit-Change-Number: 22049
Gerrit-PatchSet: 7
Gerrit-Owner: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Sun, 26 Jan 2025 23:11:05 +0000
Gerrit-HasComments: Yes

Reply via email to