I would suggest not to set -Xmx. Flink will always calculate the JVM heap size from the configuration and set a proper -Xmx. If you manually set -Xmx that overwrites the one Flink calculated, it might result in unpredictable behaviors.
Please refer to this document[1]. In short, you could leverage the configuration option "taskmanager.memory.task.heap.size", and an additional constant framework overhead will be added to this value for -Xmx. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#jvm-parameters On Fri, Jun 12, 2020 at 4:50 PM Clay Teeter <clay.tee...@maalka.com> wrote: > Thank you Xintong, while tracking down the existence of > bash-java-utils.jar I found a bug in my CI scripts that incorrectly built > the wrong version of flink. I fixed this and then added a -Xmx value. > > env: > - name: FLINK_ENV_JAVA_OPTS > value: "-Xmx{{ .Values.analytics.flink.taskManagerHeapSize }}" > > > It's running perfectly now! > > Thank you again, > Clay > > > On Fri, Jun 12, 2020 at 5:13 AM Xintong Song <tonysong...@gmail.com> > wrote: > >> Hi Clay, >> >> Could you verify the "taskmanager.sh" used is the same script shipped >> with Flink-1.10.1? Or a custom script is used? Also, does the jar file >> "bash-java-utils.jar" exist in your Flink bin directory? >> >> In Flink 1.10, the memory configuration for a TaskManager works as >> follows. >> >> - "taskmanager.sh" executes "bash-java-utils.jar" for the memory >> calculations >> - "bash-java-utils.jar" will read your "flink-conf.yaml" and all the >> "-D" arguments, and calculate memory sizes accordingly >> - "bash-java-utils.jar" will then return the memory calculation >> results as two strings, for JVM parameter ("-Xmx", "-Xms", etc.) and >> dynamic configurations ("-D") respectively >> - At this step, all the detailed memory sizes should be determined >> - That means, even for memory sizes not configured by you, there >> should be an exact value generated in the returned dynamic >> configuration >> - That also means, for memory components configured in ranges >> (e.g., network memory configured through a pair of [min, max]), >> a deterministic value should be decided and both min/max configuration >> options should already been overwrite to that value >> - "taskmanager.sh" starts the task manager JVM process with the >> returned JVM parameters, and passes the dynamic configurations as >> arguments >> into the task manager process. These dynamic configurations will be read >> by >> Flink task manager so that memory will be managed accordingly. >> >> Flink task manager expects all the memory configurations are already set >> (thus network min/max should have the same value) before it's started. In >> your case, it seems such configurations are missing. Same for the cpu cores. >> >> Thank you~ >> >> Xintong Song >> >> >> >> On Fri, Jun 12, 2020 at 12:58 AM Clay Teeter <clay.tee...@maalka.com> >> wrote: >> >>> Hi flink fans, >>> >>> I'm hoping for an easy solution. I'm trying to upgrade my 9.3 cluster >>> to flink 10.1, but i'm running into memory configuration errors. >>> >>> Such as: >>> *Caused by: >>> org.apache.flink.configuration.IllegalConfigurationException: The network >>> memory min (64 mb) and max (1 gb) mismatch, the network memory has to be >>> resolved and set to a fixed value before task executor starts* >>> >>> *Caused by: >>> org.apache.flink.configuration.IllegalConfigurationException: The required >>> configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback >>> keys: []) is not set* >>> >>> I was able to fix a cascade of errors by explicitly setting these values: >>> >>> taskmanager.memory.managed.size: {{ >>> .Values.analytics.flink.taskManagerManagedSize }} >>> taskmanager.memory.task.heap.size: {{ >>> .Values.analytics.flink.taskManagerHeapSize }} >>> taskmanager.memory.jvm-metaspace.size: 500m >>> taskmanager.cpu.cores: 4 >>> >>> So, the documentation implies that flink will default many of these >>> values, however my 101. cluster doesn't seem to be doing this. 9.3, worked >>> great! >>> >>> Do I really have to set all the memory (even network) values? If not, >>> what am I missing? >>> >>> If i do have to set all the memory parameters, how do I resolve "The >>> network memory min (64 mb) and max (1 gb) mismatch"? >>> >>> >>> My cluster runs standalone jobs on kube >>> >>> flnk-config.yaml: >>> state.backend: rocksdb >>> state.backend.incremental: true >>> state.checkpoints.num-retained: 1 >>> taskmanager.memory.managed.size: {{ >>> .Values.analytics.flink.taskManagerManagedSize }} >>> taskmanager.memory.task.heap.size: {{ >>> .Values.analytics.flink.taskManagerHeapSize }} >>> taskmanager.memory.jvm-metaspace.size: 500m >>> taskmanager.cpu.cores: 4 >>> taskmanager.numberOfTaskSlots: {{ >>> .Values.analytics.task.numberOfTaskSlots }} >>> parallelism.default: {{ .Values.analytics.flink.parallelism }} >>> >>> >>> JobManger: >>> command: ["/opt/flink/bin/standalone-job.sh"] >>> args: ["start-foreground", "-j={{ >>> .Values.analytics.flinkRunnable }}", ... >>> >>> TakManager >>> command: ["/opt/flink/bin/taskmanager.sh"] >>> args: [ >>> "start-foreground", >>> "-Djobmanager.rpc.address=localhost", >>> "-Dmetrics.reporter.prom.port=9430"] >>> >>> >>> >>>