Sorry, "To be clear, we need a container has memory larger than request, and confirm this pod has Guarantee Qos." which need to be "To be clear, we need a container has memory larger than process.size, and confirm this pod has Guarantee Qos."
Thanks. Richard Su > 2023年12月5日 20:47,richard.su <richardsuc...@gmail.com> 写道: > > Hi, Gyula, yes, this is a special case in our scenarios, sorry about that > it's hard to understand, which we want to reserved some memory beyond the > jobmanager or task manager's process.To be clear, we need a container has > memory larger than request, and confirm this pod has Guarantee Qos. > > This is because we encounter the glibc problem inside container with flink > job using Rcoksdb, which reserved memory will help to ease this problem. > > So I hope the container resources's request can be decoupling from flink > configuration. > > From flink's current implementation, this could not be done. > > Thanks. > > Richard Su > >> 2023年12月5日 20:28,Gyula Fóra <gyula.f...@gmail.com> 写道: >> >> Richard, I still don't understand why the current setup doesn't work for >> you. According to >> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup/ >> : >> >> The process memory config (which is what we configure) translates directly >> into the container request size. With the new proposal you can set the >> limit independently. >> >> What you write doesn't make sense to me: >> "user wants to define a flinkdeployment with jobmanager has 1G memory >> resources in container field but config jobmanager.memory.process.size as >> 850m" >> >> If you want to have a 1G container you set the memory request >> (process.size) in the spec simply to 1G. Then you have 1G, there are other >> configs on how this 1G will be split inside the container for various >> purposes but these are all covered in detail by the flink memory configs. >> >> Cheers >> Gyula >> >> On Tue, Dec 5, 2023 at 1:06 PM richard.su <richardsuc...@gmail.com> wrote: >> >>> I think the new configuration could be : >>> >>> "kubernetes.taskmanager.memory.amount" and >>> "kubernetes.jobmanager.memory.amout" >>> >>> once we can calculate the limit-factor by the different of requests and >>> limits. >>> >>> when native mode, we no longer check the process.size as default memory, >>> but using this configuration for decoupling logic. >>> >>> Thanks >>> >>> Richard Su >>> >>>> 2023年12月5日 19:22,richard.su <richardsuc...@gmail.com> 写道: >>>> >>>> Hi, Gyula, from my opinion, this still will using flinkDeployment's >>> resource filed to set jobManager.memory.process.size, and I have told an >>> uncovered case that: >>>> >>>> When user wants to define a flinkdeployment with jobmanager has 1G >>> memory resources in container field but config >>> jobmanager.memory.process.size as 850m, which this solution only improves >>> user config and actually make sconfig more intuitive and easier but not >>> make the container resource decoupling flink configuration. >>>> >>>> So from my side, I think it need to add new configuration to support >>> this proposal, and it need more discussion. >>>> >>>> Thanks >>>> Chaoran Su >>>> >>>> >>>>> 2023年12月5日 18:28,Gyula Fóra <gyula.f...@gmail.com> 写道: >>>>> >>>>> This is the proposal according to FLINK-33548: >>>>> >>>>> spec: >>>>> taskManager: >>>>> resources: >>>>> requests: >>>>> memory: "64Mi" >>>>> cpu: "250m" >>>>> limits: >>>>> memory: "128Mi" >>>>> cpu: "500m" >>>>> >>>>> I honestly think this is much more intuitive and easier than using the >>>>> podTemplate, which is very complex immediately. >>>>> Please tell me what use-case/setup is not covered by this improved spec. >>>>> >>>>> Unless there is a big limitation here I am still -1 for modifying the >>>>> podTemplate logic and +1 for continuing with FLINK-33548 >>>>> >>>>> Gyula >>>>> >>>>> >>>>> >>>>> On Tue, Dec 5, 2023 at 11:16 AM Surendra Singh Lilhore < >>>>> surendralilh...@gmail.com> wrote: >>>>> >>>>>> Hi Gyula, >>>>>> >>>>>> FLINK-33548 proposes adding a new resource field to match with >>> Kubernetes >>>>>> pod resource configuration. Here's my suggestion: instead of adding a >>> new >>>>>> resource field, let's use a pod template for more advanced resource >>> setup. >>>>>> Adding a new resource field might confuse users. This change can also >>> help >>>>>> with issues when users use Flink Kubernetes commands directly, without >>> the >>>>>> operator. >>>>>> >>>>>> Thanks >>>>>> Surendra >>>>>> >>>>>> >>>>>> On Tue, Dec 5, 2023 at 3:10 PM richard.su <richardsuc...@gmail.com> >>> wrote: >>>>>> >>>>>>> Sorry Gyula, let me explain more about the point of 2, if I avoid the >>>>>>> override, I will got a jobmanager pod still with resources consist >>> with >>>>>>> “jobmanager.memory.process.size”, but a flinkdeployment with a >>> resource >>>>>>> larger than that. >>>>>>> >>>>>>> Thanks for your time. >>>>>>> Richard Su >>>>>>> >>>>>>>> 2023年12月5日 17:13,richard.su <richardsuc...@gmail.com> 写道: >>>>>>>> >>>>>>>> Thank you for your time, Gyula, I have more question about >>> Flink-33548, >>>>>>> we can have more discussion about this and make progress: >>>>>>>> >>>>>>>> 1. I agree with you about declaring resources in FlinkDeployment >>>>>>> resource sections. But Flink Operator will override the >>>>>>> “jobmanager.memory.process.size” and >>> "taskmanager.memory.process.size", >>>>>>> despite I have set these configuration or not in flink configuration. >>> If >>>>>>> user had configured all memory attributes, the override will leads to >>>>>> error >>>>>>> as the overall computation is error. >>>>>>>> >>>>>>>> the code of override is in FlinkConfigManager.class in buildFrom >>>>>> method, >>>>>>> which apply to JobmanagerSpec and TaskManagerSpec. >>>>>>>> >>>>>>>> 2. If I modified the code of override, I will still encounter this >>>>>> issue >>>>>>> of FLINK-24150, because I only modified the code of flink operator but >>>>>> not >>>>>>> flink-kubernetes package, so I will make a pod resources like (cpu:1c >>>>>>> memory:1g) and container resource to be (cpu:1c, memory 850m), >>> because I >>>>>>> already set jobmanager.memory.process.size to 850m. >>>>>>>> >>>>>>>> 3. because of there two point, we need to make the podTemplate have >>>>>>> higher priority. Otherwise we can refactor the code of flink operator, >>>>>>> which should import something new configuration to support the native >>>>>> mode. >>>>>>>> >>>>>>>> I think it will be better to import some configuration, which >>>>>>> FlinkConfigManager.class can override it using the resource of >>>>>>> JobmanagerSpec and TaskManagerSpec. >>>>>>>> >>>>>>>> When it deep into the code flink-kubernetes package, we using these >>> new >>>>>>> configuration as the final result of containers resources. >>>>>>>> >>>>>>>> Thanks for your time. >>>>>>>> Richard Su >>>>>>>> >>>>>>>>> 2023年12月5日 16:45,Gyula Fóra <gyula.f...@gmail.com> 写道: >>>>>>>>> >>>>>>>>> As you can see in the jira ticket there hasn't been any progress, >>>>>> nobody >>>>>>>>> started to work on this yet. >>>>>>>>> >>>>>>>>> I personally don't think it's confusing to declare resources in the >>>>>>>>> FlinkDeployment resource sections. It's well documented and worked >>>>>> very >>>>>>>>> well so far for most users. >>>>>>>>> This is pretty common practice for kubernetes. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Gyula >>>>>>>>> >>>>>>>>> On Tue, Dec 5, 2023 at 9:35 AM richard.su <richardsuc...@gmail.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, Gyula, is there had any progress in FLINK-33548? I would like >>> to >>>>>>> join >>>>>>>>>> the discussion but I haven't seen any discussion in the url. >>>>>>>>>> >>>>>>>>>> I also make flinkdeployment by flink operator, which indeed will >>>>>>> override >>>>>>>>>> the process size by TaskmanagerSpec.resources or >>>>>>> JobmanagerSpec.resources, >>>>>>>>>> which really confused, I had modified the code of flink operator to >>>>>>> avoid >>>>>>>>>> the override. >>>>>>>>>> >>>>>>>>>> Looking for your response. >>>>>>>>>> >>>>>>>>>> Thank you. >>>>>>>>>> Richard Su >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2023年12月5日 16:22,Gyula Fóra <gyula.f...@gmail.com> 写道: >>>>>>>>>>> >>>>>>>>>>> Hi! >>>>>>>>>>> >>>>>>>>>>> Please see the discussion in >>>>>>>>>>> https://lists.apache.org/thread/6p5tk6obmk1qxf169so498z4vk8cg969 >>>>>>>>>>> and the ticket: https://issues.apache.org/jira/browse/FLINK-33548 >>>>>>>>>>> >>>>>>>>>>> We should follow the approach outlined there. If you are >>> interested >>>>>>> you >>>>>>>>>> are >>>>>>>>>>> welcome to pick up the operator ticket. >>>>>>>>>>> >>>>>>>>>>> Unfortunately your PR can be a large unexpected change to existing >>>>>>> users >>>>>>>>>> so >>>>>>>>>>> we should not add it. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Gyula >>>>>>>>>>> >>>>>>>>>>> On Tue, Dec 5, 2023 at 9:05 AM 苏超腾 <richardsuc...@gmail.com> >>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello everyone, >>>>>>>>>>>> >>>>>>>>>>>> I've encountered an issue while using flink kubernetes native, >>>>>>> Despite >>>>>>>>>>>> setting resource limits in the pod template, it appears that >>> these >>>>>>>>>> limits >>>>>>>>>>>> and requests are not considered during JobManager(JM) and >>>>>> TaskManager >>>>>>>>>> (TM) >>>>>>>>>>>> pod deployment. >>>>>>>>>>>> >>>>>>>>>>>> I find the a issue had opened in jira FLINK-24150, which >>>>>> introduced >>>>>>>>>>>> almost the same questions that I encountered. >>>>>>>>>>>> >>>>>>>>>>>> I agrees that if user had provided pod templates, we should put >>>>>>> priority >>>>>>>>>>>> on it higher than flink calculated from configuration. >>>>>>>>>>>> >>>>>>>>>>>> But this need some discussion in our community, because it >>> related >>>>>>> some >>>>>>>>>>>> scenarios: >>>>>>>>>>>> If I want to create a pod with Graranted QoS and want the memory >>> of >>>>>>> the >>>>>>>>>>>> Flink main container to be larger than the process size of >>> Flink, I >>>>>>>>>> cannot >>>>>>>>>>>> directly modify podTemplate (although we can use limit factor, >>> this >>>>>>> will >>>>>>>>>>>> cause the QoS to change from Graranted to Burstable) >>>>>>>>>>>> If I want to create a pod with Burstable QoS, I don't want to use >>>>>>> limit >>>>>>>>>>>> actor and want to directly configure the request to be 50% of the >>>>>>> limit, >>>>>>>>>>>> which cannot be modified. >>>>>>>>>>>> In order to meet these scenarios, I had committed a pull request >>>>>>>>>>>> https://github.com/apache/flink/pull/23872 >>>>>>>>>>>> >>>>>>>>>>>> This code is very simple and just need someone to review, this pr >>>>>>> can be >>>>>>>>>>>> cherry pick to other old version, which will be helpful. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I would appreciate any feedback on this. >>>>>>>>>>>> >>>>>>>>>>>> Thank you for your time and contributions to the Flink project. >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> chaoran.su >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >>> >