Hi Jinzhong, Batching state access is a reasonable way to reduce the amount of I/O compared to per-record state access. But I have some questions:
- In my opinion, we need to reduce the times of fetching RocksDB SST from remote to local. The FLIP seems to batch the RocksDB put/get requests. I am not sure this will reduce the SST fetching times. - How to monitor the I/O used by state disaggregation? The latency/amount of I/O on DFS is important to the performance diagnosis. Moreover, the amount of I/O also influences the stability of the DFS. For example, the throughput of HDFS NameNode is hard to scale and suffers from I/O flood. - How about customizing batching strategy? Intuitively, an extremely large batch may need lots of memory to hold the returned results, and causes OOM. On the other size, if we randomly batch keys, state storage may navigate all SSTs to find the results. Best wishes, Rui Xia. ________________________________ 发件人: Hangxiang Yu <master...@gmail.com> 发送时间: 2024年3月15日 4:04 收件人: xiarui0...@hotmail.com <xiarui0...@hotmail.com> 主题: Fwd: [DISCUSS] FLIP-426: Grouping Remote State Access ---------- Forwarded message --------- From: Jinzhong Li <lijinzhong2...@gmail.com<mailto:lijinzhong2...@gmail.com>> Date: Thu, Mar 7, 2024 at 4:52 PM Subject: [DISCUSS] FLIP-426: Grouping Remote State Access To: <dev@flink.apache.org<mailto:dev@flink.apache.org>> Cc: <yuanmei.w...@gmail.com<mailto:yuanmei.w...@gmail.com>>, <zakelly....@gmail.com<mailto:zakelly....@gmail.com>>, <master...@gmail.com<mailto:master...@gmail.com>>, <fredia...@gmail.com<mailto:fredia...@gmail.com>>, <fengw...@apache.org<mailto:fengw...@apache.org>> Hi devs, I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated State Storage and Management[1], which is a joint work of Yuan Mei, Zakelly Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang: - FLIP-426: Grouping Remote State Access<https://cwiki.apache.org/confluence/display/FLINK/FLIP-426%3A+Grouping+Remote+State+Access> [2] This FLIP enables retrieval of remote state data in batches to avoid unnecessary round-trip costs for remote access. Please make sure you have read the FLIP-423[1] to know the whole story, and we'll discuss the details of FLIP-424[2] under this mail. For the discussion of overall architecture or topics related with multiple sub-FLIPs, please post in the previous mail[3]. Looking forward to hearing from you! [1] https://cwiki.apache.org/confluence/x/R4p3EQ [2] https://cwiki.apache.org/confluence/x/TYp3EQ [3] https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0 Best, Jinzhong Li -- Best, Hangxiang.