Hi Yangze, Xintong, Thank you for instant response.
And big thanks for the hint on TM launch command. It indeed was the problem. I've added my own custom mesos-taskmanager.sh <https://gist.github.com/Atlaster/305b5d63429e7dbf264d43a6cc4d72e5> to echo the launch command (I've switched to DEBUG level on logging, but it didn't really display anything useful). May I suggest to add something like this in the future releases? As for my particular case, the issue was in mesos-appmaster.sh option: -Dmesos.resourcemanager.tasks.taskmanager-cmd="/opt/job/custom_launch_tm.sh" My custom launch script was slicing argument array incorrectly. Thanks for the help and regards, Alex. чт, 12 мар. 2020 г. в 15:46, Xintong Song <tonysong...@gmail.com>: > Hi Alex, > > Could you try to check and post your TM launch command? I suspect that > there might be some unrecognized arguments that prevent the rest of > arguments being parsed. > > The TM memory configuration process works as follow: > > 1. The resource manager will parse the configurations, checking which > options are configured and which are not, and calculate the size of each > memory component. (This is where ‘taskmanager.memory.process.size’ is > used.) > 2. After deriving the memory component sizes, the resource manager > will generate launch command for the task managers, with dynamic > configurations "-D <key=value>" overwriting the memory component sizes. > Therefore, even you have not configured > 'taskmanager.memory.task.heap.size', it is expected that before when the TM > is launched this config option should be available. > 3. When a task manager is started, it will not do the calculations > again, and will directly read the memory component sizes calculated by > resource manager from the dynamic configurations. That means it is not > reading ‘taskmanager.memory.process.size’ and deriving memory component > sizes from it again. > > One thing that might have caused your problem is that, when > MesosTaskExecutorRunner > parses the command line arguments (that's where the dynamic configurations > are passed in), if it meets an unrecognized token it will stop parsing > the rest of the arguments. That could be the reason that > 'taskmanager.memory.task.heap.size' > is missing. You can take a look at the launching command, see if there's > anything unexpected before the memory dynamic configurations. > > Thank you~ > > Xintong Song > > > > On Thu, Mar 12, 2020 at 2:26 PM Yangze Guo <karma...@gmail.com> wrote: > >> Hi, Alexander >> >> I could not reproduce it in my local environment. Normally, Mesos RM >> will calculate all the mem config and add it to the launch command. >> Unfortunately, all the log I could found for this command is at the >> DEBUG level. Would you mind changing the log level to DEBUG or sharing >> anything about the taskmanager launch command you could found in the >> current log? >> >> >> Best, >> Yangze Guo >> >> On Thu, Mar 12, 2020 at 1:38 PM Alexander Kasyanenko >> <as.kasyane...@gmail.com> wrote: >> > >> > Hi folks, >> > >> > I have a question related configuration for new memory introduced in >> flink 1.10. Has anyone encountered similar problem? >> > I'm trying to make use of taskmanager.memory.process.size configuration >> key in combination with mesos session cluster, but I get an error like this: >> > >> > 2020-03-11 11:44:09,771 [main] ERROR >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Error while >> starting the TaskManager >> > org.apache.flink.configuration.IllegalConfigurationException: Failed to >> create TaskExecutorResourceSpec >> > at >> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72) >> > at >> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356) >> > at >> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152) >> > at >> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308) >> > at >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.lambda$main$0(MesosTaskExecutorRunner.java:106) >> > at java.base/java.security.AccessController.doPrivileged(Native Method) >> > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) >> > at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) >> > at >> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >> > at >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.main(MesosTaskExecutorRunner.java:105) >> > Caused by: >> org.apache.flink.configuration.IllegalConfigurationException: The required >> configuration option Key: 'taskmanager.memory.task.heap.size' , default: >> null (fallback keys: []) is not set >> > at >> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90) >> > at >> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84) >> > at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390) >> > at >> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84) >> > at >> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70) >> > ... 9 more >> > >> > But when task manager is launched, it correctly parses process memory >> key: >> > >> > 2020-03-11 11:43:55,376 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -------------------------------------------------------------------------------- >> > 2020-03-11 11:43:55,377 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Starting >> MesosTaskExecutorRunner (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ >> 19:18:19 CET) >> > 2020-03-11 11:43:55,377 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - OS current >> user: root >> > 2020-03-11 11:43:57,347 [main] WARN >> org.apache.hadoop.util.NativeCodeLoader - Unable to >> load native-hadoop library for your platform... using builtin-java classes >> where applicable >> > 2020-03-11 11:43:57,535 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - JVM: >> OpenJDK 64-Bit Server VM - AdoptOpenJDK - 11/11.0.2+9 >> > 2020-03-11 11:43:57,535 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Maximum >> heap size: 746 MiBytes >> > 2020-03-11 11:43:57,535 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - JAVA_HOME: >> (not set) >> > 2020-03-11 11:43:57,539 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Hadoop >> version: 2.6.5 >> > 2020-03-11 11:43:57,539 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - JVM >> Options: >> > 2020-03-11 11:43:57,539 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -Xmx781818251 >> > 2020-03-11 11:43:57,539 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -Xms781818251 >> > 2020-03-11 11:43:57,540 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -XX:MaxDirectMemorySize=317424929 >> > 2020-03-11 11:43:57,540 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -XX:MaxMetaspaceSize=100663296 >> > 2020-03-11 11:43:57,540 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -Dlog.file=/var/log/flink-session-cluster/taskmanager.log >> > 2020-03-11 11:43:57,540 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -Dlog4j.configuration=file:/opt/flink/conf/log4j.properties >> > 2020-03-11 11:43:57,540 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -Dlogback.configurationFile=file:/opt/flink/conf/logback.xml >> > 2020-03-11 11:43:57,540 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Program >> Arguments: (none) >> > 2020-03-11 11:43:57,540 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Classpath: >> /opt/flink/lib/apache-log4j-extras-1.2.17.jar:/opt/flink/lib/flink-metrics-graphite-1.10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.6.5-8.0.jar:/opt/flink/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink/lib/flink-table_2.12-1.10.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.10.0.jar: >> > 2020-03-11 11:43:57,541 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >> -------------------------------------------------------------------------------- >> > 2020-03-11 11:43:57,542 [main] INFO >> org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Registered >> UNIX signal handlers for [TERM, HUP, INT] >> > 2020-03-11 11:43:57,550 [main] INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.memory.process.size, 2g >> > 2020-03-11 11:43:57,550 [main] INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.cpu.cores, 2 >> > 2020-03-11 11:43:57,551 [main] INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.numberOfTaskSlots, 4 >> > 2020-03-11 11:43:57,551 [main] INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: parallelism.default, 1 >> > ... >> > >> > Judging by the docs specifying taskmanager.memory.process.size key >> should be enough to launch the job, but it seems like this value is ignored. >> > I would appreciate any suggestion. >> > >> > Regards and thanks in advance, >> > Alex. >> >