Hi Alex,

Could you try to check and post your TM launch command? I suspect that
there might be some unrecognized arguments that prevent the rest of
arguments being parsed.

The TM memory configuration process works as follow:

   1. The resource manager will parse the configurations, checking which
   options are configured and which are not, and calculate the size of each
   memory component. (This is where ‘taskmanager.memory.process.size’ is used.)
   2. After deriving the memory component sizes, the resource manager will
   generate launch command for the task managers, with dynamic configurations
   "-D <key=value>" overwriting the memory component sizes. Therefore, even
   you have not configured 'taskmanager.memory.task.heap.size', it is expected
   that before when the TM is launched this config option should be available.
   3. When a task manager is started, it will not do the calculations
   again, and will directly read the memory component sizes calculated by
   resource manager from the dynamic configurations. That means it is not
   reading ‘taskmanager.memory.process.size’ and deriving memory component
   sizes from it again.

One thing that might have caused your problem is that, when
MesosTaskExecutorRunner
parses the command line arguments (that's where the dynamic configurations
are passed in), if it meets an unrecognized token it will stop parsing the
rest of the arguments. That could be the reason that
'taskmanager.memory.task.heap.size'
is missing. You can take a look at the launching command, see if there's
anything unexpected before the memory dynamic configurations.

Thank you~

Xintong Song



On Thu, Mar 12, 2020 at 2:26 PM Yangze Guo <karma...@gmail.com> wrote:

> Hi, Alexander
>
> I could not reproduce it in my local environment. Normally, Mesos RM
> will calculate all the mem config and add it to the launch command.
> Unfortunately, all the log I could found for this command is at the
> DEBUG level. Would you mind changing the log level to DEBUG or sharing
> anything about the taskmanager launch command you could found in the
> current log?
>
>
> Best,
> Yangze Guo
>
> On Thu, Mar 12, 2020 at 1:38 PM Alexander Kasyanenko
> <as.kasyane...@gmail.com> wrote:
> >
> > Hi folks,
> >
> > I have a question related configuration for new memory introduced in
> flink 1.10. Has anyone encountered similar problem?
> > I'm trying to make use of taskmanager.memory.process.size configuration
> key in combination with mesos session cluster, but I get an error like this:
> >
> > 2020-03-11 11:44:09,771 [main] ERROR
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - Error while
> starting the TaskManager
> > org.apache.flink.configuration.IllegalConfigurationException: Failed to
> create TaskExecutorResourceSpec
> > at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72)
> > at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
> > at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152)
> > at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
> > at
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.lambda$main$0(MesosTaskExecutorRunner.java:106)
> > at java.base/java.security.AccessController.doPrivileged(Native Method)
> > at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> > at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
> > at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> > at
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.main(MesosTaskExecutorRunner.java:105)
> > Caused by: org.apache.flink.configuration.IllegalConfigurationException:
> The required configuration option Key: 'taskmanager.memory.task.heap.size'
> , default: null (fallback keys: []) is not set
> > at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90)
> > at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84)
> > at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390)
> > at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84)
> > at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
> > ... 9 more
> >
> > But when task manager is launched, it correctly parses process memory
> key:
> >
> > 2020-03-11 11:43:55,376 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
> --------------------------------------------------------------------------------
> > 2020-03-11 11:43:55,377 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Starting
> MesosTaskExecutorRunner (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @
> 19:18:19 CET)
> > 2020-03-11 11:43:55,377 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  OS current
> user: root
> > 2020-03-11 11:43:57,347 [main] WARN
> org.apache.hadoop.util.NativeCodeLoader                       - Unable to
> load native-hadoop library for your platform... using builtin-java classes
> where applicable
> > 2020-03-11 11:43:57,535 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  JVM:
> OpenJDK 64-Bit Server VM - AdoptOpenJDK - 11/11.0.2+9
> > 2020-03-11 11:43:57,535 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Maximum
> heap size: 746 MiBytes
> > 2020-03-11 11:43:57,535 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  JAVA_HOME:
> (not set)
> > 2020-03-11 11:43:57,539 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Hadoop
> version: 2.6.5
> > 2020-03-11 11:43:57,539 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  JVM
> Options:
> > 2020-03-11 11:43:57,539 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
>  -Xmx781818251
> > 2020-03-11 11:43:57,539 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
>  -Xms781818251
> > 2020-03-11 11:43:57,540 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
>  -XX:MaxDirectMemorySize=317424929
> > 2020-03-11 11:43:57,540 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
>  -XX:MaxMetaspaceSize=100663296
> > 2020-03-11 11:43:57,540 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
>  -Dlog.file=/var/log/flink-session-cluster/taskmanager.log
> > 2020-03-11 11:43:57,540 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
>  -Dlog4j.configuration=file:/opt/flink/conf/log4j.properties
> > 2020-03-11 11:43:57,540 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
>  -Dlogback.configurationFile=file:/opt/flink/conf/logback.xml
> > 2020-03-11 11:43:57,540 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Program
> Arguments: (none)
> > 2020-03-11 11:43:57,540 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Classpath:
> /opt/flink/lib/apache-log4j-extras-1.2.17.jar:/opt/flink/lib/flink-metrics-graphite-1.10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.6.5-8.0.jar:/opt/flink/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink/lib/flink-table_2.12-1.10.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.10.0.jar:
> > 2020-03-11 11:43:57,541 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -
> --------------------------------------------------------------------------------
> > 2020-03-11 11:43:57,542 [main] INFO
> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - Registered
> UNIX signal handlers for [TERM, HUP, INT]
> > 2020-03-11 11:43:57,550 [main] INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.memory.process.size, 2g
> > 2020-03-11 11:43:57,550 [main] INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.cpu.cores, 2
> > 2020-03-11 11:43:57,551 [main] INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.numberOfTaskSlots, 4
> > 2020-03-11 11:43:57,551 [main] INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: parallelism.default, 1
> > ...
> >
> > Judging by the docs specifying taskmanager.memory.process.size key
> should be enough to launch the job, but it seems like this value is ignored.
> > I would appreciate any suggestion.
> >
> > Regards and thanks in advance,
> > Alex.
>

Reply via email to