Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21039 )

Change subject: IMPALA-12786: Optimize count(*) for JSON scans
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser-test.cc
File be/src/exec/json/json-parser-test.cc:

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser-test.cc@207
PS8, Line 207: TestSkip(R"("abc\")", TYPE_STRING, 
kParseErrorStringMissQuotationMark);
Add negative/positive testcase against unicode and/or encoded unicode?


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h
File be/src/exec/json/json-parser.h:

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@253
PS8, Line 253: 
https://github.com/Tencent/rapidjson/blob/master/include/rapidjson/reader.h#L539
nit: Use permalink
https://github.com/Tencent/rapidjson/blob/5ec44fb/include/rapidjson/reader.h#L539


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@269
PS8, Line 269: inline bool Consume(char expect) {
nit: Please add comment about what this function do.


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@282
PS8, Line 282: /// The following function is used to skip a specific JSON 
value. It maintains logic
             :   /// consistent with rapidjson, consuming the stream and 
returning true upon successfully
             :   /// skipping the specified value, or returning false and 
setting the respective error
             :   /// code if an error is encountered.
nit: Consider mentioning this RFC about valid JSON values.
https://datatracker.ietf.org/doc/html/rfc8259#section-3


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@293
PS8, Line 293: bool SkipValue();
rapidjson::GenericReader has ParseHex4 to handle unicode. Will that be a 
concern?

I notice you mention in commit message:
"it cannot identify string encoding errors or numeric overflow errors"


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.cc
File be/src/exec/json/json-parser.cc:

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.cc@332
PS8, Line 332: template<class Stream>
             : bool JsonSkipper<Stream>::SkipNumber() {
rapidjson::GenericReader seems to handle NaN and Infinity. This one does not. 
Is that OK?
If it is OK, please add comment and testcase.



--
To view, visit http://gerrit.cloudera.org:8080/21039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97ff097661c3c577aeafeeb1518408ce7a8a255e
Gerrit-Change-Number: 21039
Gerrit-PatchSet: 8
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Fri, 29 Mar 2024 00:46:33 +0000
Gerrit-HasComments: Yes

Reply via email to