Mihaly Szjatinya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22049 )

Change subject: IMPALA-10319: Support arbitrary encodings on Text/Sequence files
......................................................................


Patch Set 19:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/22049/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22049/19//COMMIT_MSG@7
PS19, Line 7: WIP
> Is it still WIP?
Not anymore, ack


http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/exec/text/hdfs-text-scanner.h
File be/src/exec/text/hdfs-text-scanner.h:

http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/exec/text/hdfs-text-scanner.h@272
PS19, Line 272: boost::scoped_ptr
> nit: Can this be unique_ptr?
Yes, changing also the others


http://gerrit.cloudera.org:8080/#/c/22049/17/be/src/exec/text/hdfs-text-scanner.cc
File be/src/exec/text/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/22049/17/be/src/exec/text/hdfs-text-scanner.cc@545
PS17, Line 545: if (decoder_.get() != nullptr)
> I think in new methods/classes, the less code is the better. In files where
ok, leaving this as is then


http://gerrit.cloudera.org:8080/#/c/22049/17/be/src/exec/text/hdfs-text-scanner.cc@548
PS17, Line 548: decompressor_.get() != nullptr
> nit: 'decompressor_ == nullptr' should be enough. Same goos for L749.
Leaving as is for consistency (see comment above)


http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.h
File be/src/util/char-codec.h:

http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.h@34
PS19, Line 34:   // Output buffer size for streaming encoded file.
             :   static const int64_t STREAM_OUT_BUF_SIZE;
> Seems to be unused.
indeed, removing


http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.cc
File be/src/util/char-codec.cc:

http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.cc@119
PS19, Line 119: // continue
> Just a thought: it might make sense to make the 'continue' explicit, so if
Ok, why not.


http://gerrit.cloudera.org:8080/#/c/22049/17/tests/query_test/test_charcodec.py
File tests/query_test/test_charcodec.py:

http://gerrit.cloudera.org:8080/#/c/22049/17/tests/query_test/test_charcodec.py@44
PS17, Line 44: '
> Is this self-assignment required at all?
Not really, thanks


http://gerrit.cloudera.org:8080/#/c/22049/19/tests/query_test/test_charcodec.py
File tests/query_test/test_charcodec.py:

http://gerrit.cloudera.org:8080/#/c/22049/19/tests/query_test/test_charcodec.py@48
PS19, Line 48: max_length=20):
> I don't see tests with large words. Can we add a few?
sure



--
To view, visit http://gerrit.cloudera.org:8080/22049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Gerrit-Change-Number: 22049
Gerrit-PatchSet: 19
Gerrit-Owner: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Mihaly Szjatinya <msz...@pm.me>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 29 Apr 2025 11:33:42 +0000
Gerrit-HasComments: Yes

Reply via email to