Mihaly Szjatinya has posted comments on this change. ( http://gerrit.cloudera.org:8080/22049 )
Change subject: IMPALA-10319: Support arbitrary encodings on Text/Sequence files ...................................................................... Patch Set 19: (8 comments) http://gerrit.cloudera.org:8080/#/c/22049/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22049/19//COMMIT_MSG@7 PS19, Line 7: WIP > Is it still WIP? Not anymore, ack http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/exec/text/hdfs-text-scanner.h File be/src/exec/text/hdfs-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/exec/text/hdfs-text-scanner.h@272 PS19, Line 272: boost::scoped_ptr > nit: Can this be unique_ptr? Yes, changing also the others http://gerrit.cloudera.org:8080/#/c/22049/17/be/src/exec/text/hdfs-text-scanner.cc File be/src/exec/text/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/22049/17/be/src/exec/text/hdfs-text-scanner.cc@545 PS17, Line 545: if (decoder_.get() != nullptr) > I think in new methods/classes, the less code is the better. In files where ok, leaving this as is then http://gerrit.cloudera.org:8080/#/c/22049/17/be/src/exec/text/hdfs-text-scanner.cc@548 PS17, Line 548: decompressor_.get() != nullptr > nit: 'decompressor_ == nullptr' should be enough. Same goos for L749. Leaving as is for consistency (see comment above) http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.h File be/src/util/char-codec.h: http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.h@34 PS19, Line 34: // Output buffer size for streaming encoded file. : static const int64_t STREAM_OUT_BUF_SIZE; > Seems to be unused. indeed, removing http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.cc File be/src/util/char-codec.cc: http://gerrit.cloudera.org:8080/#/c/22049/19/be/src/util/char-codec.cc@119 PS19, Line 119: // continue > Just a thought: it might make sense to make the 'continue' explicit, so if Ok, why not. http://gerrit.cloudera.org:8080/#/c/22049/17/tests/query_test/test_charcodec.py File tests/query_test/test_charcodec.py: http://gerrit.cloudera.org:8080/#/c/22049/17/tests/query_test/test_charcodec.py@44 PS17, Line 44: ' > Is this self-assignment required at all? Not really, thanks http://gerrit.cloudera.org:8080/#/c/22049/19/tests/query_test/test_charcodec.py File tests/query_test/test_charcodec.py: http://gerrit.cloudera.org:8080/#/c/22049/19/tests/query_test/test_charcodec.py@48 PS19, Line 48: max_length=20): > I don't see tests with large words. Can we add a few? sure -- To view, visit http://gerrit.cloudera.org:8080/22049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65 Gerrit-Change-Number: 22049 Gerrit-PatchSet: 19 Gerrit-Owner: Mihaly Szjatinya <msz...@pm.me> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Mihaly Szjatinya <msz...@pm.me> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 29 Apr 2025 11:33:42 +0000 Gerrit-HasComments: Yes