Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21797 )
Change subject: IMPALA-12594: Add flag to tune KrpcDataStreamSender mem estimate ...................................................................... Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21797/2/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java: http://gerrit.cloudera.org:8080/#/c/21797/2/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@98 PS2, Line 98: if (fixedLenRowSize==0) fixedLenRowSize = 1; // avoid division by 0 : long beRowsPerBuffer = : Math.max(1, (long)Math.ceil(beBufferBytes/fixedLenRowSize)); > This can blow up if all columns are var-len type, and data_stream_sender_bu As we also discussed on another channel this shouldn't lead to much more rows than the original batch_size of 1024. A string (or collection) column adds 12 bytes to fixed len size, while 1024 rows will be used when the the row size is 16 bytes (default data_stream_sender_buffer_size=16K = 1024*16), so a single large string column can raise the memory estimate, but not by much. -- To view, visit http://gerrit.cloudera.org:8080/21797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e4b1db030be934cece565e3f2634ee7cbdb7c4f Gerrit-Change-Number: 21797 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Sat, 14 Sep 2024 09:29:09 +0000 Gerrit-HasComments: Yes
