Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Yang Wang Thu, 02 Sep 2021 23:09:09 -0700

Hi Alexis

Thanks for your valuable inputs.


First, I want to share why Flink has to overwrite the resources which are
defined in the pod template. You could the fields that will be
overwritten by Flink here[1]. I think the major reason is that Flink need
to ensure the consistency between Flink configuration
(taskmanager.memory.process.size, kubernetes.taskmanager.cpu)
and pod template resource settings. Since users could specify the total
process memory or detailed memory[2], Flink will calculate the
pod resource internally. If we allow users could specify the resources via
pod template, then the users should guarantee the configuration
consistency especially when they specify the detailed memory(e.g. heap,
managed, offheap, etc.). I believe it is a new burden for them.

For the limit-factor, you are right that factors aren’t linear. But I think
the factor is more flexible than the absolute value. A bigger pod usually
could use more burst resources. Moreover, I do not suggest to set
limit-factor for memory since it does not take too much benefit. As a
comparison,
the burst cpu resources could help a lot for the performance.

[1].
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template
[1].
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_setup_tm/#detailed-memory-model


@spoon_lz <spoon...@126.com> You are right. The limit-factor should be
greater than or equal to 1. And the default value is 1.


Best,
Yang

Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年9月2日周四
下午8:20写道：

> Just to provide my opinion, I find the idea of factors unintuitive for
> this specific case. When I’m working with Kubernetes resources and sizing,
> I have to think in absolute terms for all pods and define requests and
> limits with concrete values. Using factors for Flink means that I have to
> think differently for my Flink resources, and if I’m using pod templates,
> it makes this switch more jarring because I define what is essentially
> another Kubernetes resources that I’m familiar with, but some of the values
> in my template are ignored. Additionally, if I understand correctly,
> factors aren’t linear, right? If someone specifies a 1GiB request with a
> factor of 1.5, they only get 500MiB on top, but if they specify 10GiB,
> suddenly the limit goes all the way up to 15GiB.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* spoon_lz <spoon...@126.com>
> *Sent:* Donnerstag, 2. September 2021 14:12
> *To:* Yang Wang <danrtsey...@gmail.com>
> *Cc:* Denis Cosmin NUTIU <dnu...@bitdefender.com>; Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com>; matth...@ververica.com;
> user@flink.apache.org
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi Yang,
>
> I agree with you, but I think the limit-factor should be greater than or
> equal to 1, and default to 1 is a better choice.
>
> If the default value is 1.5, the memory limit will exceed the actual
> physical memory of a node, which may result in OOM, machine downtime, or
> random pod death if the node runs full.
>
> For some required jobs, increase this value appropriately.
>
>
>
> Best,
>
> Zhuo
>
>
>
>
>
> On 09/2/2021 11:50，Yang Wang<danrtsey...@gmail.com>
> <danrtsey...@gmail.com> wrote：
>
> Given that the limit-factor should be greater than 1, then using the
> limit-factor could also work for memory.
>
>
>
> > Why do we need a larger memory resource limit than request?
>
> A typical use case I could imagine is the page cache. Having more page
> cache might improve the performance.
>
> And they could be reclaimed when the Kubernetes node does not have enough
> memory.
>
>
>
> I still believe that it is the user responsibility to configure a proper
> resource(memory and cpu), not too big. And
>
> using the limit-factor to allow the Flink job could benefit from the burst
> resources.
>
>
>
>
>
> Best,
>
> Yang
>
>
>
> spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午8:12写道：
>
> Yes, shrinking the requested memory will result in OOM. We do this because
> the user-created job provides an initial value (for example, 2 cpus and
> 4096MB of memory for TaskManager). In most cases, the user will not change
> this value unless the task fails or there is an exception such as data
> delay. This results in a waste of memory for most simple ETL tasks. These
> isolated situations may not apply to more Flink users. We can adjust
> Kubernetes instead of Flink to solve the resource waste problem.
>
> Just adjusting the CPU value might be a more robust choice, and there are
> probably some scenarios for both decreasing the CPU request and increasing
> the CPU limit
>
>
>
> Best,
>
> Zhuo
>
>
>
> On 09/1/2021 19:39，Yang Wang<danrtsey...@gmail.com>
> <danrtsey...@gmail.com> wrote：
>
> Hi Lz,
>
> Thanks for sharing your ideas.
>
>
> I have to admin that I prefer the limit factor to set the resource limit,
> not the percentage to set the resource request.
>
> Because usually the resource request is configured or calculated by Flink,
> and it indicates user required resources.
>
> It has the same semantic for all deployments(e.g. Yarn, K8s). Especially
> for the memory resource, giving a discount
>
> for the resource request may cause OOM.
>
> BTW, I am wondering why the users do not allocate fewer resources if they
> do not need.
>
>
>
> @Denis Cosmin NUTIU <dnu...@bitdefender.com> I really appreciate for that
> you want to work on this feature. Let's first to reach a consensus
>
> about the implementation. And then opening a PR is welcome.
>
>
>
>
>
> Best,
>
> Yang
>
>
>
>
>
> spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午4:36写道：
>
>
>
> Hi,everyone
>
> I have some other ideas for kubernetes resource Settings, as described by
> WangYang in [flink-15648], which increase the CPU limit by a certain
> percentage to provide more computational performance for jobs. Should we
> consider the alternative of shrinking the request to start more jobs, which
> would improve cluster resource utilization? For example, for some
> low-traffic tasks, we can even set the CPU request to 0 in extreme cases.
> Both limit enlargement and Request shrinkage may be required
>
>
>
> Best,
>
> Lz
>
> On 09/1/2021 16:06，Denis Cosmin NUTIU<dnu...@bitdefender.com>
> <dnu...@bitdefender.com> wrote：
>
> Hi Yang,
>
>
>
> I have limited Flink internals knowledge, but I can try to implement
> FLINK-15648 and open up a PR on GitHub or send the patch via email. How
> does that sound?
>
> I'll sign the ICLA and switch to my personal address.
>
>
>
> Sincerely,
>
> Denis
>
>
>
> On Wed, 2021-09-01 at 13:48 +0800, Yang Wang wrote:
>
> Great. If no one wants to work on this ticket FLINK-15648, I will try to
> get this done in the next major release cycle(1.15).
>
>
>
> Best,
>
> Yang
>
>
>
> Denis Cosmin NUTIU <dnu...@bitdefender.com> 于2021年8月31日周二 下午4:59写道：
>
> Hi everyone,
>
>
>
> Thanks for getting back to me!
>
>
>
> >  I think it would be nice if the task manager pods get their values from
> the configuration file only if the pod templates don’t specify any
> resources. That was the goal of supporting pod templates, right? Allowing
> more custom scenarios without letting the configuration options get bloated.
>
>
>
> I think that's correct. In the current behavior Flink will override the
> resources settings "The memory and cpu resources(including requests and
> limits) will be overwritten by Flink configuration options. All other
> resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the
> comments from FLINK-15648[2], I'm not sure that it can be done in a clean
> manner with pod templates.
>
>
>
> > I think it is a good improvement to support different resource requests
> and limits. And it is very useful especially for the CPU resource since it
> heavily depends on the upstream workloads.
>
>
>
> I agree with you! I have limited knowledge of Flink internals but the
> kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor
> seems to be the right way to do it.
>
>
>
> [1] Native Kubernetes | Apache Flink
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template>
>
> [2] [FLINK-15648] Support to configure limit for CPU and memory
> requirement - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/FLINK-15648>
>
>
> ------------------------------
>
> *From:* Yang Wang <danrtsey...@gmail.com>
> *Sent:* Tuesday, August 31, 2021 6:04 AM
> *To:* Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com>
> *Cc:* Denis Cosmin NUTIU <dnu...@bitdefender.com>; matth...@ververica.com
>  <matth...@ververica.com>; user@flink.apache.org <user@flink.apache.org>
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi all,
>
>
>
> I think it is a good improvement to support different resource requests
> and limits. And it is very useful
>
> especially for the CPU resource since it heavily depends on the upstream
> workloads.
>
>
>
> Actually, we(alibaba) have introduced some internal config options to
> support this feature. WDYT?
>
>
>
> *// The prefix of Kubernetes resource limit factor. It should not be less 
> than 1. The resource// could be cpu, memory, ephemeral-storage and all other 
> types supported by Kubernetes.**public static final *String 
> *KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX *=
>         *"kubernetes.jobmanager.limit-factor."*;
> *public static final *String 
> *KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX *=
>         *"kubernetes.taskmanager.limit-factor."*;
>
>
>
> BTW, we already have an old ticket for this feature[1].
>
>
>
>
>
> [1]. https://issues.apache.org/jira/browse/FLINK-15648
>
>
>
> Best,
>
> Yang
>
>
>
> Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年8月26日周四
> 下午10:04写道：
>
> I think it would be nice if the task manager pods get their values from
> the configuration file only if the pod templates don’t specify any
> resources. That was the goal of supporting pod templates, right? Allowing
> more custom scenarios without letting the configuration options get bloated.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Denis Cosmin NUTIU <dnu...@bitdefender.com>
> *Sent:* Donnerstag, 26. August 2021 15:55
> *To:* matth...@ververica.com
> *Cc:* user@flink.apache.org; danrtsey...@gmail.com
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi Matthias,
>
>
>
> Thanks for getting back to me and for your time!
>
>
>
> We have some Flink jobs deployed on Kubernetes and running kubectl top pod
> gives the following result:
>
>
>
>
> NAME                                                            CPU(cores)   
> MEMORY(bytes)
> aa-78c8cb77d4-zlmpg                  8m           1410Mi
> aa-taskmanager-2-2                   32m          1066Mi
> bb-5f7b65f95c-jwb7t          7m           1445Mi
> bb-taskmanager-2-2           32m          1016Mi
> cc-54d967b55d-b567x       11m          514Mi
> cc-taskmanager-4-1        11m          496Mi
> dd-6fbc6b8666-krhlx   10m          535Mi
> dd-taskmanager-2-2    12m          522Mi
> xx-6845cf7986-p45lq     53m          526Mi
> xx-taskmanager-5-2      11m          507Mi
>
>
>
> During low workloads the jobs consume just about 100m CPU and during high
> workloads the CPU consumption increases to 500m-1000m. Having the ability
> to specify requests and limit separately would give us more deployment
> flexibility.
>
>
>
> Sincerely,
>
> Denis
>
>
>
> On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:
>
> Hi Denis,
>
> I did a bit of digging: It looks like there is no way to specify them
> independently. You can find documentation about pod templates for
> TaskManager and JobManager [1]. But even there it states that for cpu and
> memory, the resource specs are overwritten by the Flink configuration. The
> code also reveals that limit and requests are set using the same value [2].
>
>
>
> I'm going to pull Yang Wang into this thread. I'm wondering whether there
> is a reason for that or whether it makes sense to create a Jira issue
> introducing more specific configuration parameters for limit and requests.
>
>
>
> Best,
> Matthias
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink
>
> [2]
> https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332
>
>
>
> On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <
> dnu...@bitdefender.com> wrote:
>
> Hello,
>
> I've developed a Flink job and I'm trying to deploy it on a Kubernetes
> cluster using Flink Native.
>
> Setting kubernetes.taskmanager.cpu=0.5 and
> kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
> which is correct, but I'd like to set the requests and limits to
> different values, something like:
>
> resources:
>   requests:
>     memory: "1048Mi"
>     cpu: "100m"
>   limits:
>     memory: "2096Mi"
>     cpu: "1000m"
>
> I've tried using pod templates from Flink 1.13 and manually patching
> the Kubernetes deployment file, the jobmanager gets spawned with the
> correct reousrce requests and limits but the taskmanagers get spawned
> with the defaults:
>
> Limits:
>       cpu:     1
>       memory:  1728Mi
>     Requests:
>       cpu:     1
>       memory:  1728Mi
>
> Is there any way I could set the requests/limits for the CPU/Memory to
> different values when deploying Flink in Kubernetes? If not, would it
> make sense to request this as a feature?
>
> Thanks in advance!
>
> Denis
>
>
>
>
>
>
>
>

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Reply via email to