BTW, the dynamic config will also occur in TM side logs [1]. It would be good to print it in INFO level as well.
[1] https://github.com/apache/flink/blob/663af45c7f403eb6724852915bf2078241927258/flink-mesos/src/main/java/org/apache/flink/mesos/entrypoint/MesosTaskExecutorRunner.java#L77 Best, Yangze Guo On Thu, Mar 12, 2020 at 4:06 PM Yangze Guo <karma...@gmail.com> wrote: > > It seems we already have such logs in [1]. If that is the case, +1 for > changing it to INFO level. > > [1] > https://github.com/apache/flink/blob/663af45c7f403eb6724852915bf2078241927258/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java#L341 > Best, > Yangze Guo > > On Thu, Mar 12, 2020 at 4:03 PM Alexander Kasyanenko > <as.kasyane...@gmail.com> wrote: > > > > Instead of just launching TM as it works right now, I suggest to log launch > > command first, and then launch TM. But that might be unnecessary, since the > > use case is rather specific. > > > > Regards, > > Alex. > > > > чт, 12 мар. 2020 г. в 16:58, Yangze Guo <karma...@gmail.com>: > >> > >> Glad to hear that your issue is fixed. > >> I'm not sure what you suggest to add. Could you tell it more specific > >> or create a Jira ticket? > >> > >> Best, > >> Yangze Guo > >> > >> > >> On Thu, Mar 12, 2020 at 3:51 PM Alexander Kasyanenko > >> <as.kasyane...@gmail.com> wrote: > >> > > >> > Hi Yangze, Xintong, > >> > > >> > Thank you for instant response. > >> > > >> > And big thanks for the hint on TM launch command. It indeed was the > >> > problem. I've added my own custom mesos-taskmanager.sh to echo the > >> > launch command (I've switched to DEBUG level on logging, but it didn't > >> > really display anything useful). May I suggest to add something like > >> > this in the future releases? > >> > > >> > As for my particular case, the issue was in mesos-appmaster.sh option: > >> > > >> > -Dmesos.resourcemanager.tasks.taskmanager-cmd="/opt/job/custom_launch_tm.sh" > >> > > >> > My custom launch script was slicing argument array incorrectly. > >> > > >> > Thanks for the help and regards, > >> > Alex. > >> > > >> > чт, 12 мар. 2020 г. в 15:46, Xintong Song <tonysong...@gmail.com>: > >> >> > >> >> Hi Alex, > >> >> > >> >> Could you try to check and post your TM launch command? I suspect that > >> >> there might be some unrecognized arguments that prevent the rest of > >> >> arguments being parsed. > >> >> > >> >> The TM memory configuration process works as follow: > >> >> > >> >> The resource manager will parse the configurations, checking which > >> >> options are configured and which are not, and calculate the size of > >> >> each memory component. (This is where ‘taskmanager.memory.process.size’ > >> >> is used.) > >> >> After deriving the memory component sizes, the resource manager will > >> >> generate launch command for the task managers, with dynamic > >> >> configurations "-D <key=value>" overwriting the memory component sizes. > >> >> Therefore, even you have not configured > >> >> 'taskmanager.memory.task.heap.size', it is expected that before when > >> >> the TM is launched this config option should be available. > >> >> When a task manager is started, it will not do the calculations again, > >> >> and will directly read the memory component sizes calculated by > >> >> resource manager from the dynamic configurations. That means it is not > >> >> reading ‘taskmanager.memory.process.size’ and deriving memory component > >> >> sizes from it again. > >> >> > >> >> One thing that might have caused your problem is that, when > >> >> MesosTaskExecutorRunner parses the command line arguments (that's where > >> >> the dynamic configurations are passed in), if it meets an unrecognized > >> >> token it will stop parsing the rest of the arguments. That could be the > >> >> reason that 'taskmanager.memory.task.heap.size' is missing. You can > >> >> take a look at the launching command, see if there's anything > >> >> unexpected before the memory dynamic configurations. > >> >> > >> >> Thank you~ > >> >> > >> >> Xintong Song > >> >> > >> >> > >> >> > >> >> On Thu, Mar 12, 2020 at 2:26 PM Yangze Guo <karma...@gmail.com> wrote: > >> >>> > >> >>> Hi, Alexander > >> >>> > >> >>> I could not reproduce it in my local environment. Normally, Mesos RM > >> >>> will calculate all the mem config and add it to the launch command. > >> >>> Unfortunately, all the log I could found for this command is at the > >> >>> DEBUG level. Would you mind changing the log level to DEBUG or sharing > >> >>> anything about the taskmanager launch command you could found in the > >> >>> current log? > >> >>> > >> >>> > >> >>> Best, > >> >>> Yangze Guo > >> >>> > >> >>> On Thu, Mar 12, 2020 at 1:38 PM Alexander Kasyanenko > >> >>> <as.kasyane...@gmail.com> wrote: > >> >>> > > >> >>> > Hi folks, > >> >>> > > >> >>> > I have a question related configuration for new memory introduced in > >> >>> > flink 1.10. Has anyone encountered similar problem? > >> >>> > I'm trying to make use of taskmanager.memory.process.size > >> >>> > configuration key in combination with mesos session cluster, but I > >> >>> > get an error like this: > >> >>> > > >> >>> > 2020-03-11 11:44:09,771 [main] ERROR > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > Error while starting the TaskManager > >> >>> > org.apache.flink.configuration.IllegalConfigurationException: Failed > >> >>> > to create TaskExecutorResourceSpec > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72) > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356) > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152) > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308) > >> >>> > at > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.lambda$main$0(MesosTaskExecutorRunner.java:106) > >> >>> > at java.base/java.security.AccessController.doPrivileged(Native > >> >>> > Method) > >> >>> > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > >> >>> > at > >> >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) > >> >>> > at > >> >>> > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > >> >>> > at > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.main(MesosTaskExecutorRunner.java:105) > >> >>> > Caused by: > >> >>> > org.apache.flink.configuration.IllegalConfigurationException: The > >> >>> > required configuration option Key: > >> >>> > 'taskmanager.memory.task.heap.size' , default: null (fallback keys: > >> >>> > []) is not set > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90) > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84) > >> >>> > at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390) > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84) > >> >>> > at > >> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70) > >> >>> > ... 9 more > >> >>> > > >> >>> > But when task manager is launched, it correctly parses process > >> >>> > memory key: > >> >>> > > >> >>> > 2020-03-11 11:43:55,376 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -------------------------------------------------------------------------------- > >> >>> > 2020-03-11 11:43:55,377 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > Starting MesosTaskExecutorRunner (Version: 1.10.0, Rev:aa4eb8f, > >> >>> > Date:07.02.2020 @ 19:18:19 CET) > >> >>> > 2020-03-11 11:43:55,377 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - OS > >> >>> > current user: root > >> >>> > 2020-03-11 11:43:57,347 [main] WARN > >> >>> > org.apache.hadoop.util.NativeCodeLoader - > >> >>> > Unable to load native-hadoop library for your platform... using > >> >>> > builtin-java classes where applicable > >> >>> > 2020-03-11 11:43:57,535 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > JVM: OpenJDK 64-Bit Server VM - AdoptOpenJDK - 11/11.0.2+9 > >> >>> > 2020-03-11 11:43:57,535 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > Maximum heap size: 746 MiBytes > >> >>> > 2020-03-11 11:43:57,535 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > JAVA_HOME: (not set) > >> >>> > 2020-03-11 11:43:57,539 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > Hadoop version: 2.6.5 > >> >>> > 2020-03-11 11:43:57,539 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - JVM > >> >>> > Options: > >> >>> > 2020-03-11 11:43:57,539 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -Xmx781818251 > >> >>> > 2020-03-11 11:43:57,539 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -Xms781818251 > >> >>> > 2020-03-11 11:43:57,540 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -XX:MaxDirectMemorySize=317424929 > >> >>> > 2020-03-11 11:43:57,540 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -XX:MaxMetaspaceSize=100663296 > >> >>> > 2020-03-11 11:43:57,540 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -Dlog.file=/var/log/flink-session-cluster/taskmanager.log > >> >>> > 2020-03-11 11:43:57,540 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -Dlog4j.configuration=file:/opt/flink/conf/log4j.properties > >> >>> > 2020-03-11 11:43:57,540 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -Dlogback.configurationFile=file:/opt/flink/conf/logback.xml > >> >>> > 2020-03-11 11:43:57,540 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > Program Arguments: (none) > >> >>> > 2020-03-11 11:43:57,540 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > Classpath: > >> >>> > /opt/flink/lib/apache-log4j-extras-1.2.17.jar:/opt/flink/lib/flink-metrics-graphite-1.10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.6.5-8.0.jar:/opt/flink/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink/lib/flink-table_2.12-1.10.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.10.0.jar: > >> >>> > 2020-03-11 11:43:57,541 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > -------------------------------------------------------------------------------- > >> >>> > 2020-03-11 11:43:57,542 [main] INFO > >> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner - > >> >>> > Registered UNIX signal handlers for [TERM, HUP, INT] > >> >>> > 2020-03-11 11:43:57,550 [main] INFO > >> >>> > org.apache.flink.configuration.GlobalConfiguration - > >> >>> > Loading configuration property: taskmanager.memory.process.size, 2g > >> >>> > 2020-03-11 11:43:57,550 [main] INFO > >> >>> > org.apache.flink.configuration.GlobalConfiguration - > >> >>> > Loading configuration property: taskmanager.cpu.cores, 2 > >> >>> > 2020-03-11 11:43:57,551 [main] INFO > >> >>> > org.apache.flink.configuration.GlobalConfiguration - > >> >>> > Loading configuration property: taskmanager.numberOfTaskSlots, 4 > >> >>> > 2020-03-11 11:43:57,551 [main] INFO > >> >>> > org.apache.flink.configuration.GlobalConfiguration - > >> >>> > Loading configuration property: parallelism.default, 1 > >> >>> > ... > >> >>> > > >> >>> > Judging by the docs specifying taskmanager.memory.process.size key > >> >>> > should be enough to launch the job, but it seems like this value is > >> >>> > ignored. > >> >>> > I would appreciate any suggestion. > >> >>> > > >> >>> > Regards and thanks in advance, > >> >>> > Alex.