JingsongLi commented on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-972468104
Hi all, I'll give my understanding. (Correct me if I am wrong) ## Object ArrayList vs Lazy deserialization As long as the objects inside the `ArrayList` do not fall into the GC old area, the performance difference is not significant. If we use `ArrayList`. There is a trade-off: - Larger capacity: With the complexity of downstream processing, it may cause elements to fall into the GC full zone. - Smaller capacity: The extreme case is 1, which is too costly for `BlockArrayQueue` and seriously affects throughput. Since this trade-off is more difficult to control, we try not to apply a collection of objects. If we must bundle data, we apply a structure similar to BytesMap (only binary, no objects). ## Lazy deserialization in StreamFormat The key problem is that `StreamFormat` has no way to know the real demarcation point of the implementation, which may cause the implementation to hit an EOF exception. Is it possible for StreamFormat to expose a block-like interface that allows implementations to define the demarcation of a block, or each compressed block defines the demarcation point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org