[jira] [Updated] (FLINK-35489) Metaspace size can be too little after autotuning change memory setting

Nicolas Fraison (Jira) Thu, 30 May 2024 05:07:22 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nicolas Fraison updated FLINK-35489:
------------------------------------
    Description: 
We have enable the autotuning feature on one of our flink job with below config
{code:java}
# Autoscaler configuration
job.autoscaler.enabled: "true"
job.autoscaler.stabilization.interval: 1m
job.autoscaler.metrics.window: 10m
job.autoscaler.target.utilization: "0.8"
job.autoscaler.target.utilization.boundary: "0.1"
job.autoscaler.restart.time: 2m
job.autoscaler.catch-up.duration: 10m
job.autoscaler.memory.tuning.enabled: true
job.autoscaler.memory.tuning.overhead: 0.5
job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
During a scale down the autotuning decided to give all the memory to to JVM 
(having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
Here is the config that was compute by the autotuning for a TM running on a 4GB 
pod:
{code:java}
    taskmanager.memory.network.max: 4063232b
    taskmanager.memory.network.min: 4063232b
    taskmanager.memory.jvm-overhead.max: 433791712b
    taskmanager.memory.task.heap.size: 3699934605b
    taskmanager.memory.framework.off-heap.size: 134217728b
    taskmanager.memory.jvm-metaspace.size: 22960020b
    taskmanager.memory.framework.heap.size: "0 bytes"
    taskmanager.memory.flink.size: 3838215565b
    taskmanager.memory.managed.size: 0b {code}
This has lead to some issue starting the TM because we are relying on some 
javaagent performing some memory allocation outside of the JVM (rely on some C 
bindings).

Tuning the overhead or disabling the scale-down-compensation.enabled could have 
helped for that particular event but this can leads to other issue as it could 
leads to too little HEAP size being computed.

It would be interesting to be able to set a min memory.managed.size to be taken 
in account by the autotuning.
What do you think about this? Do you think that some other specific config 
should have been applied to avoid this issue?

 

Edit see this comment that leads to the metaspace issue: 
https://issues.apache.org/jira/browse/FLINK-35489?focusedCommentId=17850694&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17850694

  was:
We have enable the autotuning feature on one of our flink job with below config
{code:java}
# Autoscaler configuration
job.autoscaler.enabled: "true"
job.autoscaler.stabilization.interval: 1m
job.autoscaler.metrics.window: 10m
job.autoscaler.target.utilization: "0.8"
job.autoscaler.target.utilization.boundary: "0.1"
job.autoscaler.restart.time: 2m
job.autoscaler.catch-up.duration: 10m
job.autoscaler.memory.tuning.enabled: true
job.autoscaler.memory.tuning.overhead: 0.5
job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
During a scale down the autotuning decided to give all the memory to to JVM 
(having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
Here is the config that was compute by the autotuning for a TM running on a 4GB 
pod:
{code:java}
    taskmanager.memory.network.max: 4063232b
    taskmanager.memory.network.min: 4063232b
    taskmanager.memory.jvm-overhead.max: 433791712b
    taskmanager.memory.task.heap.size: 3699934605b
    taskmanager.memory.framework.off-heap.size: 134217728b
    taskmanager.memory.jvm-metaspace.size: 22960020b
    taskmanager.memory.framework.heap.size: "0 bytes"
    taskmanager.memory.flink.size: 3838215565b
    taskmanager.memory.managed.size: 0b {code}
This has lead to some issue starting the TM because we are relying on some 
javaagent performing some memory allocation outside of the JVM (rely on some C 
bindings).

Tuning the overhead or disabling the scale-down-compensation.enabled could have 
helped for that particular event but this can leads to other issue as it could 
leads to too little HEAP size being computed.

It would be interesting to be able to set a min memory.managed.size to be taken 
in account by the autotuning.
What do you think about this? Do you think that some other specific config 
should have been applied to avoid this issue?

 

Edit


> Metaspace size can be too little after autotuning change memory setting
> -----------------------------------------------------------------------
>
>                 Key: FLINK-35489
>                 URL: https://issues.apache.org/jira/browse/FLINK-35489
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: 1.8.0
>            Reporter: Nicolas Fraison
>            Priority: Major
>
> We have enable the autotuning feature on one of our flink job with below 
> config
> {code:java}
> # Autoscaler configuration
> job.autoscaler.enabled: "true"
> job.autoscaler.stabilization.interval: 1m
> job.autoscaler.metrics.window: 10m
> job.autoscaler.target.utilization: "0.8"
> job.autoscaler.target.utilization.boundary: "0.1"
> job.autoscaler.restart.time: 2m
> job.autoscaler.catch-up.duration: 10m
> job.autoscaler.memory.tuning.enabled: true
> job.autoscaler.memory.tuning.overhead: 0.5
> job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
> During a scale down the autotuning decided to give all the memory to to JVM 
> (having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
> Here is the config that was compute by the autotuning for a TM running on a 
> 4GB pod:
> {code:java}
>     taskmanager.memory.network.max: 4063232b
>     taskmanager.memory.network.min: 4063232b
>     taskmanager.memory.jvm-overhead.max: 433791712b
>     taskmanager.memory.task.heap.size: 3699934605b
>     taskmanager.memory.framework.off-heap.size: 134217728b
>     taskmanager.memory.jvm-metaspace.size: 22960020b
>     taskmanager.memory.framework.heap.size: "0 bytes"
>     taskmanager.memory.flink.size: 3838215565b
>     taskmanager.memory.managed.size: 0b {code}
> This has lead to some issue starting the TM because we are relying on some 
> javaagent performing some memory allocation outside of the JVM (rely on some 
> C bindings).
> Tuning the overhead or disabling the scale-down-compensation.enabled could 
> have helped for that particular event but this can leads to other issue as it 
> could leads to too little HEAP size being computed.
> It would be interesting to be able to set a min memory.managed.size to be 
> taken in account by the autotuning.
> What do you think about this? Do you think that some other specific config 
> should have been applied to avoid this issue?
>  
> Edit see this comment that leads to the metaspace issue: 
> https://issues.apache.org/jira/browse/FLINK-35489?focusedCommentId=17850694&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17850694



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35489) Metaspace size can be too little after autotuning change memory setting

Reply via email to