Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/22661 )
Change subject: IMPALA-13894: Allow slow check in tuple cache correctness verification when file sizes differ ...................................................................... Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/22661/6/be/src/exec/tuple-text-file-util.cc File be/src/exec/tuple-text-file-util.cc: http://gerrit.cloudera.org:8080/#/c/22661/6/be/src/exec/tuple-text-file-util.cc@158 PS6, Line 158: // Files are supposed to be written in tuple-text-file-writer.cc, and are plain > Is file size an adequate check here? Please comment on assumptions here. I This check is just one part of the fast path. If the file sizes are the same, the code will still proceed to a line-by-line comparison. If the file sizes are different, as we're using plain text files with \n as row delimiter and no padding https://github.com/apache/impala/blob/f98b697c7b37e18cb1101b62243974e42f72b9f4/be/src/exec/tuple-text-file-writer.cc#L50, it's likely due to actual content changes. While file size is probably adequate for our case, this change mainly ensures to print more detailed differences when sizes differ by falling back to the slow path. Added comment about the file format. -- To view, visit http://gerrit.cloudera.org:8080/22661 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02e031410dac32d9df746201b156783a8b7d9a1a Gerrit-Change-Number: 22661 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Tue, 08 Apr 2025 16:34:40 +0000 Gerrit-HasComments: Yes