Given that the limit-factor should be greater than 1, then using the limit-factor could also work for memory.
> Why do we need a larger memory resource limit than request? A typical use case I could imagine is the page cache. Having more page cache might improve the performance. And they could be reclaimed when the Kubernetes node does not have enough memory. I still believe that it is the user responsibility to configure a proper resource(memory and cpu), not too big. And using the limit-factor to allow the Flink job could benefit from the burst resources. Best, Yang spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午8:12写道: > Yes, shrinking the requested memory will result in OOM. We do this because > the user-created job provides an initial value (for example, 2 cpus and > 4096MB of memory for TaskManager). In most cases, the user will not change > this value unless the task fails or there is an exception such as data > delay. This results in a waste of memory for most simple ETL tasks. These > isolated situations may not apply to more Flink users. We can adjust > Kubernetes instead of Flink to solve the resource waste problem. > Just adjusting the CPU value might be a more robust choice, and there are > probably some scenarios for both decreasing the CPU request and increasing > the CPU limit > > Best, > Zhuo > > On 09/1/2021 19:39,Yang Wang<danrtsey...@gmail.com> > <danrtsey...@gmail.com> wrote: > > Hi Lz, > > Thanks for sharing your ideas. > > I have to admin that I prefer the limit factor to set the resource limit, > not the percentage to set the resource request. > Because usually the resource request is configured or calculated by Flink, > and it indicates user required resources. > It has the same semantic for all deployments(e.g. Yarn, K8s). Especially > for the memory resource, giving a discount > for the resource request may cause OOM. > BTW, I am wondering why the users do not allocate fewer resources if they > do not need. > > @Denis Cosmin NUTIU <dnu...@bitdefender.com> I really appreciate for that > you want to work on this feature. Let's first to reach a consensus > about the implementation. And then opening a PR is welcome. > > > Best, > Yang > > > spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午4:36写道: > >> >> Hi,everyone >> I have some other ideas for kubernetes resource Settings, as described by >> WangYang in [flink-15648], which increase the CPU limit by a certain >> percentage to provide more computational performance for jobs. Should we >> consider the alternative of shrinking the request to start more jobs, which >> would improve cluster resource utilization? For example, for some >> low-traffic tasks, we can even set the CPU request to 0 in extreme cases. >> Both limit enlargement and Request shrinkage may be required >> >> Best, >> Lz >> On 09/1/2021 16:06,Denis Cosmin NUTIU<dnu...@bitdefender.com> >> <dnu...@bitdefender.com> wrote: >> >> Hi Yang, >> >> I have limited Flink internals knowledge, but I can try to implement >> FLINK-15648 and open up a PR on GitHub or send the patch via email. How >> does that sound? >> I'll sign the ICLA and switch to my personal address. >> >> Sincerely, >> Denis >> >> On Wed, 2021-09-01 at 13:48 +0800, Yang Wang wrote: >> >> Great. If no one wants to work on this ticket FLINK-15648, I will try to >> get this done in the next major release cycle(1.15). >> >> Best, >> Yang >> >> Denis Cosmin NUTIU <dnu...@bitdefender.com> 于2021年8月31日周二 下午4:59写道: >> >> Hi everyone, >> >> Thanks for getting back to me! >> >> > I think it would be nice if the task manager pods get their values >> from the configuration file only if the pod templates don’t specify any >> resources. That was the goal of supporting pod templates, right? Allowing >> more custom scenarios without letting the configuration options get bloated. >> >> I think that's correct. In the current behavior Flink will override the >> resources settings "The memory and cpu resources(including requests and >> limits) will be overwritten by Flink configuration options. All other >> resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the >> comments from FLINK-15648[2], I'm not sure that it can be done in a clean >> manner with pod templates. >> >> > I think it is a good improvement to support different resource >> requests and limits. And it is very useful especially for the CPU >> resource since it heavily depends on the upstream workloads. >> >> I agree with you! I have limited knowledge of Flink internals but the >> kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor >> seems to be the right way to do it. >> >> [1] Native Kubernetes | Apache Flink >> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template> >> [2] [FLINK-15648] Support to configure limit for CPU and memory >> requirement - ASF JIRA (apache.org) >> <https://issues.apache.org/jira/browse/FLINK-15648> >> >> ------------------------------ >> *From:* Yang Wang <danrtsey...@gmail.com> >> *Sent:* Tuesday, August 31, 2021 6:04 AM >> *To:* Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> >> *Cc:* Denis Cosmin NUTIU <dnu...@bitdefender.com>; matth...@ververica.com >> <matth...@ververica.com>; user@flink.apache.org <user@flink.apache.org> >> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and >> different limits and requests >> >> Hi all, >> >> I think it is a good improvement to support different resource requests >> and limits. And it is very useful >> especially for the CPU resource since it heavily depends on the upstream >> workloads. >> >> Actually, we(alibaba) have introduced some internal config options to >> support this feature. WDYT? >> >> // The prefix of Kubernetes resource limit factor. It should not be less >> than 1. The resource >> // could be cpu, memory, ephemeral-storage and all other types supported by >> Kubernetes. >> public static final String >> KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX = >> "kubernetes.jobmanager.limit-factor."; >> public static final String >> KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX = >> "kubernetes.taskmanager.limit-factor."; >> >> >> BTW, we already have an old ticket for this feature[1]. >> >> >> [1]. https://issues.apache.org/jira/browse/FLINK-15648 >> >> Best, >> Yang >> >> Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> >> 于2021年8月26日周四 下午10:04写道: >> >> I think it would be nice if the task manager pods get their values from >> the configuration file only if the pod templates don’t specify any >> resources. That was the goal of supporting pod templates, right? Allowing >> more custom scenarios without letting the configuration options get bloated. >> >> >> >> Regards, >> >> Alexis. >> >> >> >> *From:* Denis Cosmin NUTIU <dnu...@bitdefender.com> >> *Sent:* Donnerstag, 26. August 2021 15:55 >> *To:* matth...@ververica.com >> *Cc:* user@flink.apache.org; danrtsey...@gmail.com >> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and >> different limits and requests >> >> >> >> Hi Matthias, >> >> >> >> Thanks for getting back to me and for your time! >> >> >> >> We have some Flink jobs deployed on Kubernetes and running kubectl top >> pod gives the following result: >> >> >> >> >> NAME CPU(cores) >> MEMORY(bytes) >> aa-78c8cb77d4-zlmpg 8m 1410Mi >> aa-taskmanager-2-2 32m 1066Mi >> bb-5f7b65f95c-jwb7t 7m 1445Mi >> bb-taskmanager-2-2 32m 1016Mi >> cc-54d967b55d-b567x 11m 514Mi >> cc-taskmanager-4-1 11m 496Mi >> dd-6fbc6b8666-krhlx 10m 535Mi >> dd-taskmanager-2-2 12m 522Mi >> xx-6845cf7986-p45lq 53m 526Mi >> xx-taskmanager-5-2 11m 507Mi >> >> >> >> During low workloads the jobs consume just about 100m CPU and during high >> workloads the CPU consumption increases to 500m-1000m. Having the ability >> to specify requests and limit separately would give us more deployment >> flexibility. >> >> >> >> Sincerely, >> >> Denis >> >> >> >> On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote: >> >> Hi Denis, >> >> I did a bit of digging: It looks like there is no way to specify them >> independently. You can find documentation about pod templates for >> TaskManager and JobManager [1]. But even there it states that for cpu and >> memory, the resource specs are overwritten by the Flink configuration. The >> code also reveals that limit and requests are set using the same value [2]. >> >> >> >> I'm going to pull Yang Wang into this thread. I'm wondering whether there >> is a reason for that or whether it makes sense to create a Jira issue >> introducing more specific configuration parameters for limit and requests. >> >> >> >> Best, >> Matthias >> >> >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink >> >> [2] >> https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332 >> >> >> >> On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU < >> dnu...@bitdefender.com> wrote: >> >> Hello, >> >> I've developed a Flink job and I'm trying to deploy it on a Kubernetes >> cluster using Flink Native. >> >> Setting kubernetes.taskmanager.cpu=0.5 and >> kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m, >> which is correct, but I'd like to set the requests and limits to >> different values, something like: >> >> resources: >> requests: >> memory: "1048Mi" >> cpu: "100m" >> limits: >> memory: "2096Mi" >> cpu: "1000m" >> >> I've tried using pod templates from Flink 1.13 and manually patching >> the Kubernetes deployment file, the jobmanager gets spawned with the >> correct reousrce requests and limits but the taskmanagers get spawned >> with the defaults: >> >> Limits: >> cpu: 1 >> memory: 1728Mi >> Requests: >> cpu: 1 >> memory: 1728Mi >> >> Is there any way I could set the requests/limits for the CPU/Memory to >> different values when deploying Flink in Kubernetes? If not, would it >> make sense to request this as a feature? >> >> Thanks in advance! >> >> Denis >> >> >> >> >>