github-actions[bot] commented on code in PR #64091:
URL: https://github.com/apache/doris/pull/64091#discussion_r3464295451
##########
be/src/util/decompressor.cpp:
##########
@@ -552,10 +552,12 @@ Status Lz4BlockDecompressor::decompress(uint8_t* input,
uint32_t input_len,
break;
}
- // Decompress this block.
+ // Decompress this block. Capacity must track output_ptr:
remaining_output_len
+ // is fixed per large block while output_ptr advances per small
block.
auto decompressed_small_block_len = LZ4_decompress_safe(
reinterpret_cast<const char*>(input_ptr),
reinterpret_cast<char*>(output_ptr),
- compressed_small_block_len, remaining_output_len);
+ compressed_small_block_len,
+ cast_set<uint32_t>(output_max_len - (output_ptr -
output)));
Review Comment:
This memory-safety fix needs an automated test for the exact boundary it
changes. The current PR only records a manual guard-page validation, while
existing LZ4 coverage exercises normal segment `LZ4`/`LZ4F` round trips or load
parsing, not `TFileCompressType::LZ4BLOCK`/`Lz4BlockDecompressor` with one
Hadoop large block split into multiple small blocks. Please add a focused BE
unit test that builds such a stream where the first small block advances
`output_ptr` and the second would exceed the remaining output capacity, then
assert the decoder returns an error without writing past the buffer. The
related Snappy bounds fix added
`be/test/util/snappy_block_decompressor_test.cpp`; this LZ4 fix should have
analogous CI coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]