[jira] [Created] (FLINK-28373) Read larger size of data sequentially for sort-shuffle

Yingjie Cao (Jira) Mon, 04 Jul 2022 01:14:06 -0700

Yingjie Cao created FLINK-28373:
-----------------------------------

             Summary: Read larger size of data sequentially for sort-shuffle
                 Key: FLINK-28373
                 URL: https://issues.apache.org/jira/browse/FLINK-28373
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Network
            Reporter: Yingjie Cao
             Fix For: 1.16.0



Currently, for sort blocking shuffle, the corresponding data readers read 
shuffle data in buffer granularity. Before compression, each buffer is 32K by 
default, after compression the size will become smaller (may less than 10K). 
For file IO, this is pretty smaller. To achieve better performance and reduce 
IOPS, we can merge consecutive data requests of the same field together and 
serves them in one IO request. More specifically,

1) if multiple data requests are reading the same data, for example, reading 
broadcast data, the reader will read the data only once and send the same piece 
of data to multiple downstream consumers.

2) if multiple data requests are reading the consecutive data in one file, we 
will merge those data requests together as one large request and read a larger 
size of data sequentially which is good for file IO performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28373) Read larger size of data sequentially for sort-shuffle

Reply via email to