Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Yang Wang Fri, 03 Sep 2021 05:02:46 -0700

Hi Alexis,

Thanks for sharing more thoughts about resource configuration. Your
suggestions make a lot of sense to me.
I believe it could also help others especially for those who are more
familiar with K8s and tend to use pod template
as far as possible.


I have created a ticket for this feature[1].

[1]. https://issues.apache.org/jira/browse/FLINK-24150


Best,
Yang



Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年9月3日周五
下午5:01写道：

> Hi Yang,
>
>
>
> I understand the issue, and yes, if Flink memory must be specified in the
> configuration anyway, it’s probably better to leave memory configuration in
> the templates empty.
>
>
>
> For the CPU case I still think the template’s requests/limits should have
> priority if they are specified. The factor could still be used if the
> template doesn’t specify anything. I’m not sure if it would be entirely
> intuitive, but the logic could be something like this:
>
>
>
>    1. To choose CPU request
>       1. Read pod template first
>       2. If template doesn’t have anything, read from
>       kubernetes.taskmanager.cpu
>       3. If configuration is not specified, fall back to default
>    2. To choose CPU limit
>       1. Read from template first
>       2. If template doesn’t have anything, apply factor to what was
>       chosen in step 1, where the default factor is 1.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Yang Wang <danrtsey...@gmail.com>
> *Sent:* Freitag, 3. September 2021 08:09
> *To:* Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com>
> *Cc:* spoon_lz <spoon...@126.com>; Denis Cosmin NUTIU <
> dnu...@bitdefender.com>; matth...@ververica.com; user@flink.apache.org
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi Alexis
>
>
>
> Thanks for your valuable inputs.
>
>
>
> First, I want to share why Flink has to overwrite the resources which are
> defined in the pod template. You could the fields that will be
>
> overwritten by Flink here[1]. I think the major reason is that Flink need
> to ensure the consistency between Flink configuration
>
> (taskmanager.memory.process.size, kubernetes.taskmanager.cpu)
>
> and pod template resource settings. Since users could specify the total
> process memory or detailed memory[2], Flink will calculate the
>
> pod resource internally. If we allow users could specify the resources via
> pod template, then the users should guarantee the configuration
>
> consistency especially when they specify the detailed memory(e.g. heap,
> managed, offheap, etc.). I believe it is a new burden for them.
>
>
>
> For the limit-factor, you are right that factors aren’t linear. But I
> think the factor is more flexible than the absolute value. A bigger pod
> usually
>
> could use more burst resources. Moreover, I do not suggest to set
> limit-factor for memory since it does not take too much benefit. As a
> comparison,
>
> the burst cpu resources could help a lot for the performance.
>
>
>
> [1].
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template
>
> [1].
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_setup_tm/#detailed-memory-model
>
>
>
>
>
> @spoon_lz <spoon...@126.com> You are right. The limit-factor should be
> greater than or equal to 1. And the default value is 1.
>
>
>
>
>
> Best,
>
> Yang
>
>
>
> Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年9月2日周四
> 下午8:20写道：
>
> Just to provide my opinion, I find the idea of factors unintuitive for
> this specific case. When I’m working with Kubernetes resources and sizing,
> I have to think in absolute terms for all pods and define requests and
> limits with concrete values. Using factors for Flink means that I have to
> think differently for my Flink resources, and if I’m using pod templates,
> it makes this switch more jarring because I define what is essentially
> another Kubernetes resources that I’m familiar with, but some of the values
> in my template are ignored. Additionally, if I understand correctly,
> factors aren’t linear, right? If someone specifies a 1GiB request with a
> factor of 1.5, they only get 500MiB on top, but if they specify 10GiB,
> suddenly the limit goes all the way up to 15GiB.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* spoon_lz <spoon...@126.com>
> *Sent:* Donnerstag, 2. September 2021 14:12
> *To:* Yang Wang <danrtsey...@gmail.com>
> *Cc:* Denis Cosmin NUTIU <dnu...@bitdefender.com>; Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com>; matth...@ververica.com;
> user@flink.apache.org
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi Yang,
>
> I agree with you, but I think the limit-factor should be greater than or
> equal to 1, and default to 1 is a better choice.
>
> If the default value is 1.5, the memory limit will exceed the actual
> physical memory of a node, which may result in OOM, machine downtime, or
> random pod death if the node runs full.
>
> For some required jobs, increase this value appropriately.
>
>
>
> Best,
>
> Zhuo
>
>
>
>
>
> On 09/2/2021 11:50，Yang Wang<danrtsey...@gmail.com>
> <danrtsey...@gmail.com> wrote：
>
> Given that the limit-factor should be greater than 1, then using the
> limit-factor could also work for memory.
>
>
>
> > Why do we need a larger memory resource limit than request?
>
> A typical use case I could imagine is the page cache. Having more page
> cache might improve the performance.
>
> And they could be reclaimed when the Kubernetes node does not have enough
> memory.
>
>
>
> I still believe that it is the user responsibility to configure a proper
> resource(memory and cpu), not too big. And
>
> using the limit-factor to allow the Flink job could benefit from the burst
> resources.
>
>
>
>
>
> Best,
>
> Yang
>
>
>
> spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午8:12写道：
>
> Yes, shrinking the requested memory will result in OOM. We do this because
> the user-created job provides an initial value (for example, 2 cpus and
> 4096MB of memory for TaskManager). In most cases, the user will not change
> this value unless the task fails or there is an exception such as data
> delay. This results in a waste of memory for most simple ETL tasks. These
> isolated situations may not apply to more Flink users. We can adjust
> Kubernetes instead of Flink to solve the resource waste problem.
>
> Just adjusting the CPU value might be a more robust choice, and there are
> probably some scenarios for both decreasing the CPU request and increasing
> the CPU limit
>
>
>
> Best,
>
> Zhuo
>
>
>
> On 09/1/2021 19:39，Yang Wang<danrtsey...@gmail.com>
> <danrtsey...@gmail.com> wrote：
>
> Hi Lz,
>
> Thanks for sharing your ideas.
>
>
> I have to admin that I prefer the limit factor to set the resource limit,
> not the percentage to set the resource request.
>
> Because usually the resource request is configured or calculated by Flink,
> and it indicates user required resources.
>
> It has the same semantic for all deployments(e.g. Yarn, K8s). Especially
> for the memory resource, giving a discount
>
> for the resource request may cause OOM.
>
> BTW, I am wondering why the users do not allocate fewer resources if they
> do not need.
>
>
>
> @Denis Cosmin NUTIU <dnu...@bitdefender.com> I really appreciate for that
> you want to work on this feature. Let's first to reach a consensus
>
> about the implementation. And then opening a PR is welcome.
>
>
>
>
>
> Best,
>
> Yang
>
>
>
>
>
> spoon_lz <spoon...@126.com> 于2021年9月1日周三 下午4:36写道：
>
>
>
> Hi,everyone
>
> I have some other ideas for kubernetes resource Settings, as described by
> WangYang in [flink-15648], which increase the CPU limit by a certain
> percentage to provide more computational performance for jobs. Should we
> consider the alternative of shrinking the request to start more jobs, which
> would improve cluster resource utilization? For example, for some
> low-traffic tasks, we can even set the CPU request to 0 in extreme cases.
> Both limit enlargement and Request shrinkage may be required
>
>
>
> Best,
>
> Lz
>
> On 09/1/2021 16:06，Denis Cosmin NUTIU<dnu...@bitdefender.com>
> <dnu...@bitdefender.com> wrote：
>
> Hi Yang,
>
>
>
> I have limited Flink internals knowledge, but I can try to implement
> FLINK-15648 and open up a PR on GitHub or send the patch via email. How
> does that sound?
>
> I'll sign the ICLA and switch to my personal address.
>
>
>
> Sincerely,
>
> Denis
>
>
>
> On Wed, 2021-09-01 at 13:48 +0800, Yang Wang wrote:
>
> Great. If no one wants to work on this ticket FLINK-15648, I will try to
> get this done in the next major release cycle(1.15).
>
>
>
> Best,
>
> Yang
>
>
>
> Denis Cosmin NUTIU <dnu...@bitdefender.com> 于2021年8月31日周二 下午4:59写道：
>
> Hi everyone,
>
>
>
> Thanks for getting back to me!
>
>
>
> >  I think it would be nice if the task manager pods get their values from
> the configuration file only if the pod templates don’t specify any
> resources. That was the goal of supporting pod templates, right? Allowing
> more custom scenarios without letting the configuration options get bloated.
>
>
>
> I think that's correct. In the current behavior Flink will override the
> resources settings "The memory and cpu resources(including requests and
> limits) will be overwritten by Flink configuration options. All other
> resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the
> comments from FLINK-15648[2], I'm not sure that it can be done in a clean
> manner with pod templates.
>
>
>
> > I think it is a good improvement to support different resource requests
> and limits. And it is very useful especially for the CPU resource since it
> heavily depends on the upstream workloads.
>
>
>
> I agree with you! I have limited knowledge of Flink internals but the
> kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor
> seems to be the right way to do it.
>
>
>
> [1] Native Kubernetes | Apache Flink
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template>
>
> [2] [FLINK-15648] Support to configure limit for CPU and memory
> requirement - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/FLINK-15648>
>
>
> ------------------------------
>
> *From:* Yang Wang <danrtsey...@gmail.com>
> *Sent:* Tuesday, August 31, 2021 6:04 AM
> *To:* Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com>
> *Cc:* Denis Cosmin NUTIU <dnu...@bitdefender.com>; matth...@ververica.com
>  <matth...@ververica.com>; user@flink.apache.org <user@flink.apache.org>
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi all,
>
>
>
> I think it is a good improvement to support different resource requests
> and limits. And it is very useful
>
> especially for the CPU resource since it heavily depends on the upstream
> workloads.
>
>
>
> Actually, we(alibaba) have introduced some internal config options to
> support this feature. WDYT?
>
>
>
> *// The prefix of Kubernetes resource limit factor. It should not be less 
> than 1. The resource// could be cpu, memory, ephemeral-storage and all other 
> types supported by Kubernetes.**public static final *String 
> *KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX *=
>         *"kubernetes.jobmanager.limit-factor."*;
> *public static final *String 
> *KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX *=
>         *"kubernetes.taskmanager.limit-factor."*;
>
>
>
> BTW, we already have an old ticket for this feature[1].
>
>
>
>
>
> [1]. https://issues.apache.org/jira/browse/FLINK-15648
>
>
>
> Best,
>
> Yang
>
>
>
> Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com> 于2021年8月26日周四
> 下午10:04写道：
>
> I think it would be nice if the task manager pods get their values from
> the configuration file only if the pod templates don’t specify any
> resources. That was the goal of supporting pod templates, right? Allowing
> more custom scenarios without letting the configuration options get bloated.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Denis Cosmin NUTIU <dnu...@bitdefender.com>
> *Sent:* Donnerstag, 26. August 2021 15:55
> *To:* matth...@ververica.com
> *Cc:* user@flink.apache.org; danrtsey...@gmail.com
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi Matthias,
>
>
>
> Thanks for getting back to me and for your time!
>
>
>
> We have some Flink jobs deployed on Kubernetes and running kubectl top pod
> gives the following result:
>
>
>
>
> NAME                                                            CPU(cores)   
> MEMORY(bytes)
> aa-78c8cb77d4-zlmpg                  8m           1410Mi
> aa-taskmanager-2-2                   32m          1066Mi
> bb-5f7b65f95c-jwb7t          7m           1445Mi
> bb-taskmanager-2-2           32m          1016Mi
> cc-54d967b55d-b567x       11m          514Mi
> cc-taskmanager-4-1        11m          496Mi
> dd-6fbc6b8666-krhlx   10m          535Mi
> dd-taskmanager-2-2    12m          522Mi
> xx-6845cf7986-p45lq     53m          526Mi
> xx-taskmanager-5-2      11m          507Mi
>
>
>
> During low workloads the jobs consume just about 100m CPU and during high
> workloads the CPU consumption increases to 500m-1000m. Having the ability
> to specify requests and limit separately would give us more deployment
> flexibility.
>
>
>
> Sincerely,
>
> Denis
>
>
>
> On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:
>
> Hi Denis,
>
> I did a bit of digging: It looks like there is no way to specify them
> independently. You can find documentation about pod templates for
> TaskManager and JobManager [1]. But even there it states that for cpu and
> memory, the resource specs are overwritten by the Flink configuration. The
> code also reveals that limit and requests are set using the same value [2].
>
>
>
> I'm going to pull Yang Wang into this thread. I'm wondering whether there
> is a reason for that or whether it makes sense to create a Jira issue
> introducing more specific configuration parameters for limit and requests.
>
>
>
> Best,
> Matthias
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink
>
> [2]
> https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332
>
>
>
> On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <
> dnu...@bitdefender.com> wrote:
>
> Hello,
>
> I've developed a Flink job and I'm trying to deploy it on a Kubernetes
> cluster using Flink Native.
>
> Setting kubernetes.taskmanager.cpu=0.5 and
> kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
> which is correct, but I'd like to set the requests and limits to
> different values, something like:
>
> resources:
>   requests:
>     memory: "1048Mi"
>     cpu: "100m"
>   limits:
>     memory: "2096Mi"
>     cpu: "1000m"
>
> I've tried using pod templates from Flink 1.13 and manually patching
> the Kubernetes deployment file, the jobmanager gets spawned with the
> correct reousrce requests and limits but the taskmanagers get spawned
> with the defaults:
>
> Limits:
>       cpu:     1
>       memory:  1728Mi
>     Requests:
>       cpu:     1
>       memory:  1728Mi
>
> Is there any way I could set the requests/limits for the CPU/Memory to
> different values when deploying Flink in Kubernetes? If not, would it
> make sense to request this as a feature?
>
> Thanks in advance!
>
> Denis
>
>
>
>
>
>
>
>

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Reply via email to