Hi Alexis, Thanks for sharing more thoughts about resource configuration. Your suggestions make a lot of sense to me. I believe it could also help others especially for those who are more familiar with K8s and tend to use pod template as far as possible.
I have created a ticket for this feature[1]. [1]. https://issues.apache.org/jira/browse/FLINK-24150 Best, Yang Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年9月3日周五 下午5:01写道: > Hi Yang, > > > > I understand the issue, and yes, if Flink memory must be specified in the > configuration anyway, it’s probably better to leave memory configuration in > the templates empty. > > > > For the CPU case I still think the template’s requests/limits should have > priority if they are specified. The factor could still be used if the > template doesn’t specify anything. I’m not sure if it would be entirely > intuitive, but the logic could be something like this: > > > > 1. To choose CPU request > 1. Read pod template first > 2. If template doesn’t have anything, read from > kubernetes.taskmanager.cpu > 3. If configuration is not specified, fall back to default > 2. To choose CPU limit > 1. Read from template first > 2. If template doesn’t have anything, apply factor to what was > chosen in step 1, where the default factor is 1. > > > > Regards, > > Alexis. > > > > *From:* Yang Wang <danrtsey...@gmail.com> > *Sent:* Freitag, 3. September 2021 08:09 > *To:* Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> > *Cc:* spoon_lz <spoon...@126.com>; Denis Cosmin NUTIU < > dnu...@bitdefender.com>; matth...@ververica.com; user@flink.apache.org > *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and > different limits and requests > > > > Hi Alexis > > > > Thanks for your valuable inputs. > > > > First, I want to share why Flink has to overwrite the resources which are > defined in the pod template. You could the fields that will be > > overwritten by Flink here[1]. I think the major reason is that Flink need > to ensure the consistency between Flink configuration > > (taskmanager.memory.process.size, kubernetes.taskmanager.cpu) > > and pod template resource settings. Since users could specify the total > process memory or detailed memory[2], Flink will calculate the > > pod resource internally. If we allow users could specify the resources via > pod template, then the users should guarantee the configuration > > consistency especially when they specify the detailed memory(e.g. heap, > managed, offheap, etc.). I believe it is a new burden for them. > > > > For the limit-factor, you are right that factors aren’t linear. But I > think the factor is more flexible than the absolute value. A bigger pod > usually > > could use more burst resources. Moreover, I do not suggest to set > limit-factor for memory since it does not take too much benefit. As a > comparison, > > the burst cpu resources could help a lot for the performance. > > > > [1]. > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template > > [1]. > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_setup_tm/#detailed-memory-model > > > > > > @spoon_lz <spoon...@126.com> You are right. The limit-factor should be > greater than or equal to 1. And the default value is 1. > > > > > > Best, > > Yang > > > > Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年9月2日周四 > 下午8:20写道: > > Just to provide my opinion, I find the idea of factors unintuitive for > this specific case. When I’m working with Kubernetes resources and sizing, > I have to think in absolute terms for all pods and define requests and > limits with concrete values. Using factors for Flink means that I have to > think differently for my Flink resources, and if I’m using pod templates, > it makes this switch more jarring because I define what is essentially > another Kubernetes resources that I’m familiar with, but some of the values > in my template are ignored. Additionally, if I understand correctly, > factors aren’t linear, right? If someone specifies a 1GiB request with a > factor of 1.5, they only get 500MiB on top, but if they specify 10GiB, > suddenly the limit goes all the way up to 15GiB. > > > > Regards, > > Alexis. > > > > *From:* spoon_lz <spoon...@126.com> > *Sent:* Donnerstag, 2. September 2021 14:12 > *To:* Yang Wang <danrtsey...@gmail.com> > *Cc:* Denis Cosmin NUTIU <dnu...@bitdefender.com>; Alexis Sarda-Espinosa < > alexis.sarda-espin...@microfocus.com>; matth...@ververica.com; > user@flink.apache.org > *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and > different limits and requests > > > > Hi Yang, > > I agree with you, but I think the limit-factor should be greater than or > equal to 1, and default to 1 is a better choice. > > If the default value is 1.5, the memory limit will exceed the actual > physical memory of a node, which may result in OOM, machine downtime, or > random pod death if the node runs full. > > For some required jobs, increase this value appropriately. > > > > Best, > > Zhuo > > > > > > On 09/2/2021 11:50,Yang Wang<danrtsey...@gmail.com> > <danrtsey...@gmail.com> wrote: > > Given that the limit-factor should be greater than 1, then using the > limit-factor could also work for memory. > > > > > Why do we need a larger memory resource limit than request? > > A typical use case I could imagine is the page cache. Having more page > cache might improve the performance. > > And they could be reclaimed when the Kubernetes node does not have enough > memory. > > > > I still believe that it is the user responsibility to configure a proper > resource(memory and cpu), not too big. And > > using the limit-factor to allow the Flink job could benefit from the burst > resources. > > > > > > Best, > > Yang > > > > spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午8:12写道: > > Yes, shrinking the requested memory will result in OOM. We do this because > the user-created job provides an initial value (for example, 2 cpus and > 4096MB of memory for TaskManager). In most cases, the user will not change > this value unless the task fails or there is an exception such as data > delay. This results in a waste of memory for most simple ETL tasks. These > isolated situations may not apply to more Flink users. We can adjust > Kubernetes instead of Flink to solve the resource waste problem. > > Just adjusting the CPU value might be a more robust choice, and there are > probably some scenarios for both decreasing the CPU request and increasing > the CPU limit > > > > Best, > > Zhuo > > > > On 09/1/2021 19:39,Yang Wang<danrtsey...@gmail.com> > <danrtsey...@gmail.com> wrote: > > Hi Lz, > > Thanks for sharing your ideas. > > > I have to admin that I prefer the limit factor to set the resource limit, > not the percentage to set the resource request. > > Because usually the resource request is configured or calculated by Flink, > and it indicates user required resources. > > It has the same semantic for all deployments(e.g. Yarn, K8s). Especially > for the memory resource, giving a discount > > for the resource request may cause OOM. > > BTW, I am wondering why the users do not allocate fewer resources if they > do not need. > > > > @Denis Cosmin NUTIU <dnu...@bitdefender.com> I really appreciate for that > you want to work on this feature. Let's first to reach a consensus > > about the implementation. And then opening a PR is welcome. > > > > > > Best, > > Yang > > > > > > spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午4:36写道: > > > > Hi,everyone > > I have some other ideas for kubernetes resource Settings, as described by > WangYang in [flink-15648], which increase the CPU limit by a certain > percentage to provide more computational performance for jobs. Should we > consider the alternative of shrinking the request to start more jobs, which > would improve cluster resource utilization? For example, for some > low-traffic tasks, we can even set the CPU request to 0 in extreme cases. > Both limit enlargement and Request shrinkage may be required > > > > Best, > > Lz > > On 09/1/2021 16:06,Denis Cosmin NUTIU<dnu...@bitdefender.com> > <dnu...@bitdefender.com> wrote: > > Hi Yang, > > > > I have limited Flink internals knowledge, but I can try to implement > FLINK-15648 and open up a PR on GitHub or send the patch via email. How > does that sound? > > I'll sign the ICLA and switch to my personal address. > > > > Sincerely, > > Denis > > > > On Wed, 2021-09-01 at 13:48 +0800, Yang Wang wrote: > > Great. If no one wants to work on this ticket FLINK-15648, I will try to > get this done in the next major release cycle(1.15). > > > > Best, > > Yang > > > > Denis Cosmin NUTIU <dnu...@bitdefender.com> 于2021年8月31日周二 下午4:59写道: > > Hi everyone, > > > > Thanks for getting back to me! > > > > > I think it would be nice if the task manager pods get their values from > the configuration file only if the pod templates don’t specify any > resources. That was the goal of supporting pod templates, right? Allowing > more custom scenarios without letting the configuration options get bloated. > > > > I think that's correct. In the current behavior Flink will override the > resources settings "The memory and cpu resources(including requests and > limits) will be overwritten by Flink configuration options. All other > resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the > comments from FLINK-15648[2], I'm not sure that it can be done in a clean > manner with pod templates. > > > > > I think it is a good improvement to support different resource requests > and limits. And it is very useful especially for the CPU resource since it > heavily depends on the upstream workloads. > > > > I agree with you! I have limited knowledge of Flink internals but the > kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor > seems to be the right way to do it. > > > > [1] Native Kubernetes | Apache Flink > <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template> > > [2] [FLINK-15648] Support to configure limit for CPU and memory > requirement - ASF JIRA (apache.org) > <https://issues.apache.org/jira/browse/FLINK-15648> > > > ------------------------------ > > *From:* Yang Wang <danrtsey...@gmail.com> > *Sent:* Tuesday, August 31, 2021 6:04 AM > *To:* Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> > *Cc:* Denis Cosmin NUTIU <dnu...@bitdefender.com>; matth...@ververica.com > <matth...@ververica.com>; user@flink.apache.org <user@flink.apache.org> > *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and > different limits and requests > > > > Hi all, > > > > I think it is a good improvement to support different resource requests > and limits. And it is very useful > > especially for the CPU resource since it heavily depends on the upstream > workloads. > > > > Actually, we(alibaba) have introduced some internal config options to > support this feature. WDYT? > > > > *// The prefix of Kubernetes resource limit factor. It should not be less > than 1. The resource// could be cpu, memory, ephemeral-storage and all other > types supported by Kubernetes.**public static final *String > *KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX *= > *"kubernetes.jobmanager.limit-factor."*; > *public static final *String > *KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX *= > *"kubernetes.taskmanager.limit-factor."*; > > > > BTW, we already have an old ticket for this feature[1]. > > > > > > [1]. https://issues.apache.org/jira/browse/FLINK-15648 > > > > Best, > > Yang > > > > Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年8月26日周四 > 下午10:04写道: > > I think it would be nice if the task manager pods get their values from > the configuration file only if the pod templates don’t specify any > resources. That was the goal of supporting pod templates, right? Allowing > more custom scenarios without letting the configuration options get bloated. > > > > Regards, > > Alexis. > > > > *From:* Denis Cosmin NUTIU <dnu...@bitdefender.com> > *Sent:* Donnerstag, 26. August 2021 15:55 > *To:* matth...@ververica.com > *Cc:* user@flink.apache.org; danrtsey...@gmail.com > *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and > different limits and requests > > > > Hi Matthias, > > > > Thanks for getting back to me and for your time! > > > > We have some Flink jobs deployed on Kubernetes and running kubectl top pod > gives the following result: > > > > > NAME CPU(cores) > MEMORY(bytes) > aa-78c8cb77d4-zlmpg 8m 1410Mi > aa-taskmanager-2-2 32m 1066Mi > bb-5f7b65f95c-jwb7t 7m 1445Mi > bb-taskmanager-2-2 32m 1016Mi > cc-54d967b55d-b567x 11m 514Mi > cc-taskmanager-4-1 11m 496Mi > dd-6fbc6b8666-krhlx 10m 535Mi > dd-taskmanager-2-2 12m 522Mi > xx-6845cf7986-p45lq 53m 526Mi > xx-taskmanager-5-2 11m 507Mi > > > > During low workloads the jobs consume just about 100m CPU and during high > workloads the CPU consumption increases to 500m-1000m. Having the ability > to specify requests and limit separately would give us more deployment > flexibility. > > > > Sincerely, > > Denis > > > > On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote: > > Hi Denis, > > I did a bit of digging: It looks like there is no way to specify them > independently. You can find documentation about pod templates for > TaskManager and JobManager [1]. But even there it states that for cpu and > memory, the resource specs are overwritten by the Flink configuration. The > code also reveals that limit and requests are set using the same value [2]. > > > > I'm going to pull Yang Wang into this thread. I'm wondering whether there > is a reason for that or whether it makes sense to create a Jira issue > introducing more specific configuration parameters for limit and requests. > > > > Best, > Matthias > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink > > [2] > https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332 > > > > On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU < > dnu...@bitdefender.com> wrote: > > Hello, > > I've developed a Flink job and I'm trying to deploy it on a Kubernetes > cluster using Flink Native. > > Setting kubernetes.taskmanager.cpu=0.5 and > kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m, > which is correct, but I'd like to set the requests and limits to > different values, something like: > > resources: > requests: > memory: "1048Mi" > cpu: "100m" > limits: > memory: "2096Mi" > cpu: "1000m" > > I've tried using pod templates from Flink 1.13 and manually patching > the Kubernetes deployment file, the jobmanager gets spawned with the > correct reousrce requests and limits but the taskmanagers get spawned > with the defaults: > > Limits: > cpu: 1 > memory: 1728Mi > Requests: > cpu: 1 > memory: 1728Mi > > Is there any way I could set the requests/limits for the CPU/Memory to > different values when deploying Flink in Kubernetes? If not, would it > make sense to request this as a feature? > > Thanks in advance! > > Denis > > > > > > > >