Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21039 )
Change subject: IMPALA-12786: Optimize count(*) for JSON scans ...................................................................... Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser-test.cc File be/src/exec/json/json-parser-test.cc: http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser-test.cc@207 PS8, Line 207: TestSkip(R"("abc\")", TYPE_STRING, kParseErrorStringMissQuotationMark); Add negative/positive testcase against unicode and/or encoded unicode? http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h File be/src/exec/json/json-parser.h: http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@253 PS8, Line 253: https://github.com/Tencent/rapidjson/blob/master/include/rapidjson/reader.h#L539 nit: Use permalink https://github.com/Tencent/rapidjson/blob/5ec44fb/include/rapidjson/reader.h#L539 http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@269 PS8, Line 269: inline bool Consume(char expect) { nit: Please add comment about what this function do. http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@282 PS8, Line 282: /// The following function is used to skip a specific JSON value. It maintains logic : /// consistent with rapidjson, consuming the stream and returning true upon successfully : /// skipping the specified value, or returning false and setting the respective error : /// code if an error is encountered. nit: Consider mentioning this RFC about valid JSON values. https://datatracker.ietf.org/doc/html/rfc8259#section-3 http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@293 PS8, Line 293: bool SkipValue(); rapidjson::GenericReader has ParseHex4 to handle unicode. Will that be a concern? I notice you mention in commit message: "it cannot identify string encoding errors or numeric overflow errors" http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.cc File be/src/exec/json/json-parser.cc: http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.cc@332 PS8, Line 332: template<class Stream> : bool JsonSkipper<Stream>::SkipNumber() { rapidjson::GenericReader seems to handle NaN and Infinity. This one does not. Is that OK? If it is OK, please add comment and testcase. -- To view, visit http://gerrit.cloudera.org:8080/21039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97ff097661c3c577aeafeeb1518408ce7a8a255e Gerrit-Change-Number: 21039 Gerrit-PatchSet: 8 Gerrit-Owner: Zihao Ye <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Fri, 29 Mar 2024 00:46:33 +0000 Gerrit-HasComments: Yes
