Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24126

to look at the new patch set (#2).

Change subject: PROTOTYPE: Partitioned exchanges should accumulate rows up to 
the buffer size
......................................................................

PROTOTYPE: Partitioned exchanges should accumulate rows up to the buffer size

For a partitioned exchange, the sender sets aside a buffer for
each receiver based on data_stream_sender_buffer_size (default
value of 16KB). It currently takes the buffer size and divides
it by the size of each row to produce a limit on the number of
rows to accumulate per receiver.

However, the row size can be overestimated significantly, so
this can actually use much less space than the desired buffer
size. This is particularly common for multiple-aggregations
where a row is expected to have one tuple set out of many.

This changes the logic to allow rows to continue to accumulate
as long as the size of the outbound batch is less than the
buffer size. It does not limit the message to the buffer size,
so this will only consolidate smaller messages into a larger
message. Since this is limiteded to the buffer size, it doesn't
increase the risk of extremely large messages.

On TPC-DS Q67, this makes an enormous difference for one of the
main exchanges by significantly reducing the number of messages by over 5x.
Before:
 - RpcNetworkTime: (Avg: 275.289us ; Min: 14.520us ; Max: 1.935ms ; Sum: 
14s202ms ; Number of samples: 51591)
 - InactiveTotalTime: 1s538ms
 - UncompressedRowBatchSize: 206.84 MB (216886146)

After
 - RpcNetworkTime: (Avg: 555.997us ; Min: 18.844us ; Max: 5.526ms ; Sum: 
5s073ms ; Number of samples: 9125)
 - InactiveTotalTime: 407.136ms
 - UncompressedRowBatchSize: 206.66 MB (216697921)

TPC-DS Q67 got 30% faster based on this:
| TPCDS(20) | TPCDS-Q67                 | parquet / none / none | 3.10   | 4.48 
       | I -30.85%  |   1.17%    |   0.63%        | 25    | I -30.93%      | 
-5.94   | -150.98 |

Change-Id: Ic9a677f558fff18fae8e5b2f57920dff6df9388e
---
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/outbound-row-batch.h
M be/src/runtime/outbound-row-batch.inline.h
5 files changed, 23 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/24126/2
--
To view, visit http://gerrit.cloudera.org:8080/24126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9a677f558fff18fae8e5b2f57920dff6df9388e
Gerrit-Change-Number: 24126
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to