Hi Roman, Thanks for the proposal! This will make scheduling a lot more flexible for our use case.
Just a quick follow-up question about the number of new configs we’re adding here. It seems like the proposed configs provide a lot of flexibility, but at the expense of added complexity. It seems like operators would either choose isolation for the cluster’s jobs or they would want to share the memory between jobs. I’m not sure I see the motivation to reserve only part of the memory for sharing and allowing jobs to choose whether they will share or be isolated. I’m new to Flink, though, and no operational experience. Is there a use case you have in mind for this kind of split configuration? Thanks, John On Wed, Nov 9, 2022, at 08:17, Roman Khachatryan wrote: > Hi Yanfei, > > Thanks, good questions > >> 1. Is shared-memory only for the state backend? If both >> "taskmanager.memory.managed.shared-fraction: >0" and >> "state.backend.rocksdb.memory.managed: false" are set at the same time, >> will the shared-memory be wasted? > Yes, shared memory is only for the state backend currently; > If no job uses it then it will be wasted. > Session cluster can not validate this configuration > because the job configuration is not known in advance. > >> 2. It's said that "Jobs 4 and 5 will use the same 750Mb of unmanaged > memory >> and will compete with each other" in the example, how is the memory size > of >> unmanaged part calculated? > It's calculated the same way as managed memory size currently, > i.e. taskmanager.memory.managed.size * > taskmanager.memory.managed.shared-fraction > Separate parameters for unmanaged memory would be more clear. > However, I doubt that this configuration would ever be used (I listed it > just for completeness). > So I'm not sure whether adding them would be justified. > WDYT? > >> 3. For fine-grained-resource-management, the control >> of cpuCores, taskHeapMemory can still work, right? > Yes, for other resources fine-grained-resource-management should work. > >> And I am a little >> worried that too many memory-about configuration options are complicated >> for users to understand. > I'm also worried about having too many options, but I don't see any better > alternative. > The existing users definitely shouldn't be affected, > so there must be at least feature toggle ("shared-fraction"). > "share-scope" could potentially be replaced by some inference logic, > but having it explicit seems less error-prone. > > Regards, > Roman > > > On Wed, Nov 9, 2022 at 3:50 AM Yanfei Lei <fredia...@gmail.com> wrote: > >> Hi Roman, >> Thanks for the proposal, this allows State Backend to make better use of >> memory. >> >> After reading the ticket, I'm curious about some points: >> >> 1. Is shared-memory only for the state backend? If both >> "taskmanager.memory.managed.shared-fraction: >0" and >> "state.backend.rocksdb.memory.managed: false" are set at the same time, >> will the shared-memory be wasted? >> 2. It's said that "Jobs 4 and 5 will use the same 750Mb of unmanaged memory >> and will compete with each other" in the example, how is the memory size of >> unmanaged part calculated? >> 3. For fine-grained-resource-management, the control >> of cpuCores, taskHeapMemory can still work, right? And I am a little >> worried that too many memory-about configuration options are complicated >> for users to understand. >> >> Regards, >> Yanfei >> >> Roman Khachatryan <ro...@apache.org> 于2022年11月8日周二 23:22写道: >> >> > Hi everyone, >> > >> > I'd like to discuss sharing RocksDB memory across slots as proposed in >> > FLINK-29928 [1]. >> > >> > Since 1.10 / FLINK-7289 [2], it is possible to: >> > - share these objects among RocksDB instances of the same slot >> > - bound the total memory usage by all RocksDB instances of a TM >> > >> > However, the memory is divided between the slots equally (unless using >> > fine-grained resource control). This is sub-optimal if some slots contain >> > more memory intensive tasks than the others. >> > Using fine-grained resource control is also often not an option because >> the >> > workload might not be known in advance. >> > >> > The proposal is to widen the scope of sharing memory to TM, so that it >> can >> > be shared across all RocksDB instances of that TM. That would reduce the >> > overall memory consumption in exchange for resource isolation. >> > >> > Please see FLINK-29928 [1] for more details. >> > >> > Looking forward to feedback on that proposal. >> > >> > [1] >> > https://issues.apache.org/jira/browse/FLINK-29928 >> > [2] >> > https://issues.apache.org/jira/browse/FLINK-7289 >> > >> > Regards, >> > Roman >> > >>