[ https://issues.apache.org/jira/browse/FLINK-36526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zakelly Lan updated FLINK-36526: -------------------------------- Description: Currently, the ForSt gives a direct buffer to {{{}ByteBufferWritableFSDataOutputStream{}}}, where the data will be written one byte by byte. According our perf, the statistics of hadoop based fs will be updated once for each byte, which takes a lot of CPU. Below is a flamegraph, where the statistics part is marked as purple (taking 8.14% of the overall CPU). !image-2024-10-14-15-52-41-457.png|width=1296,height=616! It might be better to copy to a heap buffer before invoking write. was: Currently, the ForSt gives a direct buffer to \{{ByteBufferWritableFSDataOutputStream}}, where the data will be written one byte by byte. According our perf, the statistics of hadoop based fs will be updated once for each byte, which takes a lot of CPU. !image-2024-10-14-15-52-41-457.png! It might be better to copy to a heap buffer before invoking write. > Optimize the overhead of writing with direct buffer in ForSt > ------------------------------------------------------------- > > Key: FLINK-36526 > URL: https://issues.apache.org/jira/browse/FLINK-36526 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends > Reporter: Zakelly Lan > Assignee: Zakelly Lan > Priority: Major > Attachments: image-2024-10-14-15-52-41-457.png > > > Currently, the ForSt gives a direct buffer to > {{{}ByteBufferWritableFSDataOutputStream{}}}, where the data will be written > one byte by byte. According our perf, the statistics of hadoop based fs will > be updated once for each byte, which takes a lot of CPU. Below is a > flamegraph, where the statistics part is marked as purple (taking 8.14% of > the overall CPU). > !image-2024-10-14-15-52-41-457.png|width=1296,height=616! > > It might be better to copy to a heap buffer before invoking write. -- This message was sent by Atlassian Jira (v8.20.10#820010)