Hi everyone, Thanks for your valuable feedback!
Our discussions have been going on for a while. As a sub-FLIP of FLIP-423 which is nearing a consensus, I would like to start a vote after 72 hours. Please let me know if you have any concerns, thanks! On Mon, Mar 11, 2024 at 11:48 AM Hangxiang Yu <master...@gmail.com> wrote: > Hi, Jeyhun. > > Thanks for the reply. > > Is this argument true for all workloads? Or does this argument also hold > for workloads with many small files, which is quite a common case [1] ? > > Yes, I think so. The overhead should still be considered negligible, > particularly in comparison to remote I/O, and other benefits of this > proposal may be more significant than this one. > > Additionally, there is JNI overhead when Flink calls RocksDB methods > currently. The frequency of these calls could surpass that of actual file > system interface calls, given that not all state requests require accessing > the file system. > > BTW, the issue with small files can also impact the performance of db with > the local file system at runtime, so we usually resolve this firstly in the > production environment. > > the engine spawns huge amount of scan range requests to the > file system to retrieve different parts of a file. > > Indeed, frequent requests to the remote file system can significantly > affect performance. To address this, other FLIPs have introduced various > strategies: > > 1. Local disk cache to minimize remote requests as described in FLIP-423 > which we will introduce in FLIP-429 as you mentioned. With effective cache > utilization, the performance will not be inferior to the local strategy > when cache hits. > > 2. Grouping remote access to decrease the number of remote I/O requests, > as proposed in "FLIP-426: Grouping Remote State Access." > > 3. Parallel I/O to maximize network bandwidth usage, outlined in > "FLIP-425: Asynchronous Execution Model." > > The PoC implements a simple file cache and asynchronous execution which > improves the performance a lot. You could also refer to the PoC results in > FLIP-423. > > On Mon, Mar 11, 2024 at 3:11 AM Jeyhun Karimov <je.kari...@gmail.com> > wrote: > >> Hi Hangxiang, >> >> Thanks for the proposal. +1 for it. >> I have a few comments. >> >> Proposal 2 has additional JNI overhead, but the overhead is relatively >> > negligible when weighed against the latency of remote I/O. >> >> - Is this argument true for all workloads? Or does this argument also hold >> for workloads with many small files, which is quite a common case [1] ? >> >> - Also, in many workloads the engine does not need the whole file either >> because of the query forces it or >> file type supports efficient filtering (e.g. ORC, parquet, arrow files), >> or >> simply one file is "divided" among multiple workers. >> In these cases, the engine spawns huge amount of scan range requests to >> the >> file system to retrieve different parts of a file. >> How the proposed solution would work with these workloads? >> >> - The similar question related to the above applies also for caching ( I >> know caching is subject of FLIP-429, asking here becasue of the related >> section in this FLIP). >> >> Regards, >> Jeyhun >> >> [1] https://blog.min.io/challenge-big-data-small-files/ >> >> >> >> On Thu, Mar 7, 2024 at 10:09 AM Hangxiang Yu <master...@gmail.com> wrote: >> >> > Hi devs, >> > >> > >> > I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated >> > State Storage and Management[1], which is a joint work of Yuan Mei, >> Zakelly >> > Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang: >> > >> > - FLIP-427: Disaggregated State Store >> > >> > This FLIP introduces the initial version of the ForSt disaggregated >> state >> > store. >> > >> > Please make sure you have read the FLIP-423[1] to know the whole story, >> and >> > we'll discuss the details of FLIP-427[2] under this mail. For the >> > discussion of overall architecture or topics related with multiple >> > sub-FLIPs, please post in the previous mail[3]. >> > >> > Looking forward to hearing from you! >> > >> > [1] https://cwiki.apache.org/confluence/x/R4p3EQ >> > >> > [2] https://cwiki.apache.org/confluence/x/T4p3EQ >> > >> > [3] https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0 >> > >> > >> > Best, >> > >> > Hangxiang. >> > >> > > > -- > Best, > Hangxiang. > -- Best, Hangxiang.