Glad to hear that your issue is fixed. I'm not sure what you suggest to add. Could you tell it more specific or create a Jira ticket?
Best, Yangze Guo On Thu, Mar 12, 2020 at 3:51 PM Alexander Kasyanenko <as.kasyane...@gmail.com> wrote: > > Hi Yangze, Xintong, > > Thank you for instant response. > > And big thanks for the hint on TM launch command. It indeed was the problem. > I've added my own custom mesos-taskmanager.sh to echo the launch command > (I've switched to DEBUG level on logging, but it didn't really display > anything useful). May I suggest to add something like this in the future > releases? > > As for my particular case, the issue was in mesos-appmaster.sh option: > > -Dmesos.resourcemanager.tasks.taskmanager-cmd="/opt/job/custom_launch_tm.sh" > > My custom launch script was slicing argument array incorrectly. > > Thanks for the help and regards, > Alex. > > чт, 12 мар. 2020 г. в 15:46, Xintong Song <tonysong...@gmail.com>: >> >> Hi Alex, >> >> Could you try to check and post your TM launch command? I suspect that there >> might be some unrecognized arguments that prevent the rest of arguments >> being parsed. >> >> The TM memory configuration process works as follow: >> >> The resource manager will parse the configurations, checking which options >> are configured and which are not, and calculate the size of each memory >> component. (This is where ‘taskmanager.memory.process.size’ is used.) >> After deriving the memory component sizes, the resource manager will >> generate launch command for the task managers, with dynamic configurations >> "-D <key=value>" overwriting the memory component sizes. Therefore, even you >> have not configured 'taskmanager.memory.task.heap.size', it is expected that >> before when the TM is launched this config option should be available. >> When a task manager is started, it will not do the calculations again, and >> will directly read the memory component sizes calculated by resource manager >> from the dynamic configurations. That means it is not reading >> ‘taskmanager.memory.process.size’ and deriving memory component sizes from >> it again. >> >> One thing that might have caused your problem is that, when >> MesosTaskExecutorRunner parses the command line arguments (that's where the >> dynamic configurations are passed in), if it meets an unrecognized token it >> will stop parsing the rest of the arguments. That could be the reason that >> 'taskmanager.memory.task.heap.size' is missing. You can take a look at the >> launching command, see if there's anything unexpected before the memory >> dynamic configurations. >> >> Thank you~ >> >> Xintong Song >> >> >> >> On Thu, Mar 12, 2020 at 2:26 PM Yangze Guo <karma...@gmail.com> wrote: >>> >>> Hi, Alexander >>> >>> I could not reproduce it in my local environment. Normally, Mesos RM >>> will calculate all the mem config and add it to the launch command. >>> Unfortunately, all the log I could found for this command is at the >>> DEBUG level. Would you mind changing the log level to DEBUG or sharing >>> anything about the taskmanager launch command you could found in the >>> current log? >>> >>> >>> Best, >>> Yangze Guo >>> >>> On Thu, Mar 12, 2020 at 1:38 PM Alexander Kasyanenko >>> <as.kasyane...@gmail.com> wrote: >>> > >>> > Hi folks, >>> > >>> > I have a question related configuration for new memory introduced in >>> > flink 1.10. Has anyone encountered similar problem? >>> > I'm trying to make use of taskmanager.memory.process.size configuration >>> > key in combination with mesos session cluster, but I get an error like >>> > this: >>> > >>> > 2020-03-11 11:44:09,771 [main] ERROR >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Error >>> > while starting the TaskManager >>> > org.apache.flink.configuration.IllegalConfigurationException: Failed to >>> > create TaskExecutorResourceSpec >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72) >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356) >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152) >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308) >>> > at >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.lambda$main$0(MesosTaskExecutorRunner.java:106) >>> > at java.base/java.security.AccessController.doPrivileged(Native Method) >>> > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) >>> > at >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) >>> > at >>> > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >>> > at >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.main(MesosTaskExecutorRunner.java:105) >>> > Caused by: org.apache.flink.configuration.IllegalConfigurationException: >>> > The required configuration option Key: >>> > 'taskmanager.memory.task.heap.size' , default: null (fallback keys: []) >>> > is not set >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90) >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84) >>> > at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390) >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84) >>> > at >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70) >>> > ... 9 more >>> > >>> > But when task manager is launched, it correctly parses process memory key: >>> > >>> > 2020-03-11 11:43:55,376 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -------------------------------------------------------------------------------- >>> > 2020-03-11 11:43:55,377 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Starting >>> > MesosTaskExecutorRunner (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ >>> > 19:18:19 CET) >>> > 2020-03-11 11:43:55,377 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - OS >>> > current user: root >>> > 2020-03-11 11:43:57,347 [main] WARN >>> > org.apache.hadoop.util.NativeCodeLoader - Unable to >>> > load native-hadoop library for your platform... using builtin-java >>> > classes where applicable >>> > 2020-03-11 11:43:57,535 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - JVM: >>> > OpenJDK 64-Bit Server VM - AdoptOpenJDK - 11/11.0.2+9 >>> > 2020-03-11 11:43:57,535 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Maximum >>> > heap size: 746 MiBytes >>> > 2020-03-11 11:43:57,535 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > JAVA_HOME: (not set) >>> > 2020-03-11 11:43:57,539 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Hadoop >>> > version: 2.6.5 >>> > 2020-03-11 11:43:57,539 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - JVM >>> > Options: >>> > 2020-03-11 11:43:57,539 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -Xmx781818251 >>> > 2020-03-11 11:43:57,539 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -Xms781818251 >>> > 2020-03-11 11:43:57,540 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -XX:MaxDirectMemorySize=317424929 >>> > 2020-03-11 11:43:57,540 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -XX:MaxMetaspaceSize=100663296 >>> > 2020-03-11 11:43:57,540 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -Dlog.file=/var/log/flink-session-cluster/taskmanager.log >>> > 2020-03-11 11:43:57,540 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -Dlog4j.configuration=file:/opt/flink/conf/log4j.properties >>> > 2020-03-11 11:43:57,540 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -Dlogback.configurationFile=file:/opt/flink/conf/logback.xml >>> > 2020-03-11 11:43:57,540 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - Program >>> > Arguments: (none) >>> > 2020-03-11 11:43:57,540 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > Classpath: >>> > /opt/flink/lib/apache-log4j-extras-1.2.17.jar:/opt/flink/lib/flink-metrics-graphite-1.10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.6.5-8.0.jar:/opt/flink/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink/lib/flink-table_2.12-1.10.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.10.0.jar: >>> > 2020-03-11 11:43:57,541 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > -------------------------------------------------------------------------------- >>> > 2020-03-11 11:43:57,542 [main] INFO >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - >>> > Registered UNIX signal handlers for [TERM, HUP, INT] >>> > 2020-03-11 11:43:57,550 [main] INFO >>> > org.apache.flink.configuration.GlobalConfiguration - Loading >>> > configuration property: taskmanager.memory.process.size, 2g >>> > 2020-03-11 11:43:57,550 [main] INFO >>> > org.apache.flink.configuration.GlobalConfiguration - Loading >>> > configuration property: taskmanager.cpu.cores, 2 >>> > 2020-03-11 11:43:57,551 [main] INFO >>> > org.apache.flink.configuration.GlobalConfiguration - Loading >>> > configuration property: taskmanager.numberOfTaskSlots, 4 >>> > 2020-03-11 11:43:57,551 [main] INFO >>> > org.apache.flink.configuration.GlobalConfiguration - Loading >>> > configuration property: parallelism.default, 1 >>> > ... >>> > >>> > Judging by the docs specifying taskmanager.memory.process.size key should >>> > be enough to launch the job, but it seems like this value is ignored. >>> > I would appreciate any suggestion. >>> > >>> > Regards and thanks in advance, >>> > Alex.