Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21932 )
Change subject: IMPALA-13509: Copy rows directly to OutboundRowBatch during hash partitioning ...................................................................... Patch Set 13: (5 comments) http://gerrit.cloudera.org:8080/#/c/21932/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21932/12//COMMIT_MSG@67 PS12, Line 67: TPCH benchmarks improved significantly > Nice improvements! Did you also measure TPC-DS? tpcds also had ~5% improvement http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/descriptors.h@845 PS12, Line 845: /// Number of tuples per row. Has IR_NO_INLINE to make it replacable with constant > nit: Could you please add comment to this function? Done http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender-ir.cc File be/src/runtime/krpc-data-stream-sender-ir.cc: http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender-ir.cc@35 PS12, Line 35: // codegend. : int channel_ids[RowBatch::HASH_BATCH_SIZE]; : int row_idx = 0; : while (row_idx < num_rows) { : int row_count = 0; : FOREACH_ROW_LIMIT(batch, row_idx, RowBatch::HASH_BATCH_SIZE, row_batch_iter) { : TupleRow* row = row_batch_iter.Get(); : channel_ids[row_count++] = HashRow(row, exchange_hash_seed_) % num_channels; : } : row_count = 0; : FOREACH_ROW_LIMIT(batch, row_idx, RowBatch::HASH_BATCH_SIZE, row_batch_iter) { : int channel_id = channel_ids[row_count++]; : PartitionRowCollector& collector = partition_row_collectors_[channel_id]; : > This code is a bit subtle due to the micro-optimization introduced by https Added comment. Note that this logic was added before implementing codegen in KrpcDataStreamSender, so the things written in IMPALA-6461 may not be true anymore. I still think that it makes sense to have these micro batches to allow better pipelining in the hashing part which would be ruined by the many branches in deep copy, but this is just speculation. http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender.h File be/src/runtime/krpc-data-stream-sender.h: http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender.h@184 PS12, Line 184: TransmitData( > typo Done http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/outbound-row-batch.inline.h File be/src/runtime/outbound-row-batch.inline.h: http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/outbound-row-batch.inline.h@76 PS12, Line 76: &tuple_data_.back()) + 1; > Maybe 'tuple_data_.back()' is simpler. Done -- To view, visit http://gerrit.cloudera.org:8080/21932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I81a16c2f0fcfc1f3adef7077b3932a29a0f15a8f Gerrit-Change-Number: 21932 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Sat, 09 Nov 2024 15:42:56 +0000 Gerrit-HasComments: Yes
