On Thu, Oct 14, 2021 at 1:21 PM Jeevan Ladhe <jeevan.la...@enterprisedb.com> wrote: > Agree. Removed the CHUNK_SIZE and the loop.
Try harder. :-) The loop is gone, but CHUNK_SIZE itself seems to have evaded the executioner. > Fair enough. I have made the change in the bbsink_lz4_begin_backup() to > make sure we reserve enough extra bytes for the header and the footer those > are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively. > The LZ4F_compressBound() when passed the input size as "0", would give > the upper bound for output buffer needed by the LZ4F_compressEnd(). I think this is not the best way to accomplish the goal. Adding LZ4F_compressBound(0) to next_buf_len makes the buffer substantially bigger for something that's only going to happen once. We are assuming in any case, I think, that LZ4F_compressBound(0) <= LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to do is have bbsink_end_archive() empty the buffer, if necessary, before calling LZ4F_compressEnd(). With just that change, you can set next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound -- but that's also more than you need. You can instead do next_buf_len = Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're probably thinking that won't work, because bbsink_lz4_begin_archive() could fill up the buffer partway, and then the first call to bbsink_lz4_archive_contents() could overrun it. But that problem can be solved by reversing the order of operations in bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(), test whether you need to empty the buffer first, and if so, do it. That's actually less confusing than the way you've got it, because as you have it written, we don't really know why we're emptying the buffer -- is it to prepare for the next call to LZ4F_compressUpdate(), or is it to prepare for the call to LZ4F_compressEnd()? How do we know now how much space the next person writing into the buffer is going to need? It seems better if bbsink_lz4_archive_contents() empties the buffer before calling LZ4F_compressUpdate() if that call might not have enough space, and likewise bbsink_lz4_end_archive() empties the buffer before calling LZ4F_compressEnd() if that's needed. That way, each callback makes the space *it* needs, not the space the *next* caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the buffer after LZ4F_compressEnd(), so we don't orphan any data.) On another note, if the call to LZ4F_freeCompressionContext() is required in bbsink_lz4_end_archive(), then I think this code is going to just leak the memory used by the compression context if an error occurs before this code is reached. That kind of sucks. The way to fix it, I suppose, is a TRY/CATCH block, but I don't think that can be something internal to basebackup_lz4.c: I think the bbsink stuff would need to provide some kind of infrastructure for basebackup_lz4.c to use. It would be a lot better if we could instead get LZ4 to allocate memory using palloc(), but a quick Google search suggests that you can't accomplish that without recompiling liblz4, and that's not workable since we don't want to require a liblz4 built specifically for PostgreSQL. Do you see any other solution? -- Robert Haas EDB: http://www.enterprisedb.com