It seems we already have such logs in [1]. If that is the case, +1 for
changing it to INFO level.

[1] 
https://github.com/apache/flink/blob/663af45c7f403eb6724852915bf2078241927258/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java#L341
Best,
Yangze Guo

On Thu, Mar 12, 2020 at 4:03 PM Alexander Kasyanenko
<as.kasyane...@gmail.com> wrote:
>
> Instead of just launching TM as it works right now, I suggest to log launch 
> command first, and then launch TM. But that might be unnecessary, since the 
> use case is rather specific.
>
> Regards,
> Alex.
>
> чт, 12 мар. 2020 г. в 16:58, Yangze Guo <karma...@gmail.com>:
>>
>> Glad to hear that your issue is fixed.
>> I'm not sure what you suggest to add. Could you tell it more specific
>> or create a Jira ticket?
>>
>> Best,
>> Yangze Guo
>>
>>
>> On Thu, Mar 12, 2020 at 3:51 PM Alexander Kasyanenko
>> <as.kasyane...@gmail.com> wrote:
>> >
>> > Hi Yangze, Xintong,
>> >
>> > Thank you for instant response.
>> >
>> > And big thanks for the hint on TM launch command. It indeed was the 
>> > problem. I've added my own custom mesos-taskmanager.sh to echo the launch 
>> > command (I've switched to DEBUG level on logging, but it didn't really 
>> > display anything useful). May I suggest to add something like this in the 
>> > future releases?
>> >
>> > As for my particular case, the issue was in mesos-appmaster.sh option:
>> >
>> > -Dmesos.resourcemanager.tasks.taskmanager-cmd="/opt/job/custom_launch_tm.sh"
>> >
>> > My custom launch script was slicing argument array incorrectly.
>> >
>> > Thanks for the help and regards,
>> > Alex.
>> >
>> > чт, 12 мар. 2020 г. в 15:46, Xintong Song <tonysong...@gmail.com>:
>> >>
>> >> Hi Alex,
>> >>
>> >> Could you try to check and post your TM launch command? I suspect that 
>> >> there might be some unrecognized arguments that prevent the rest of 
>> >> arguments being parsed.
>> >>
>> >> The TM memory configuration process works as follow:
>> >>
>> >> The resource manager will parse the configurations, checking which 
>> >> options are configured and which are not, and calculate the size of each 
>> >> memory component. (This is where ‘taskmanager.memory.process.size’ is 
>> >> used.)
>> >> After deriving the memory component sizes, the resource manager will 
>> >> generate launch command for the task managers, with dynamic 
>> >> configurations "-D <key=value>" overwriting the memory component sizes. 
>> >> Therefore, even you have not configured 
>> >> 'taskmanager.memory.task.heap.size', it is expected that before when the 
>> >> TM is launched this config option should be available.
>> >> When a task manager is started, it will not do the calculations again, 
>> >> and will directly read the memory component sizes calculated by resource 
>> >> manager from the dynamic configurations. That means it is not reading 
>> >> ‘taskmanager.memory.process.size’ and deriving memory component sizes 
>> >> from it again.
>> >>
>> >> One thing that might have caused your problem is that, when 
>> >> MesosTaskExecutorRunner parses the command line arguments (that's where 
>> >> the dynamic configurations are passed in), if it meets an unrecognized 
>> >> token it will stop parsing the rest of the arguments. That could be the 
>> >> reason that 'taskmanager.memory.task.heap.size' is missing. You can take 
>> >> a look at the launching command, see if there's anything unexpected 
>> >> before the memory dynamic configurations.
>> >>
>> >> Thank you~
>> >>
>> >> Xintong Song
>> >>
>> >>
>> >>
>> >> On Thu, Mar 12, 2020 at 2:26 PM Yangze Guo <karma...@gmail.com> wrote:
>> >>>
>> >>> Hi, Alexander
>> >>>
>> >>> I could not reproduce it in my local environment. Normally, Mesos RM
>> >>> will calculate all the mem config and add it to the launch command.
>> >>> Unfortunately, all the log I could found for this command is at the
>> >>> DEBUG level. Would you mind changing the log level to DEBUG or sharing
>> >>> anything about the taskmanager launch command you could found in the
>> >>> current log?
>> >>>
>> >>>
>> >>> Best,
>> >>> Yangze Guo
>> >>>
>> >>> On Thu, Mar 12, 2020 at 1:38 PM Alexander Kasyanenko
>> >>> <as.kasyane...@gmail.com> wrote:
>> >>> >
>> >>> > Hi folks,
>> >>> >
>> >>> > I have a question related configuration for new memory introduced in 
>> >>> > flink 1.10. Has anyone encountered similar problem?
>> >>> > I'm trying to make use of taskmanager.memory.process.size 
>> >>> > configuration key in combination with mesos session cluster, but I get 
>> >>> > an error like this:
>> >>> >
>> >>> > 2020-03-11 11:44:09,771 [main] ERROR 
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - Error 
>> >>> > while starting the TaskManager
>> >>> > org.apache.flink.configuration.IllegalConfigurationException: Failed 
>> >>> > to create TaskExecutorResourceSpec
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72)
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152)
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
>> >>> > at 
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.lambda$main$0(MesosTaskExecutorRunner.java:106)
>> >>> > at java.base/java.security.AccessController.doPrivileged(Native Method)
>> >>> > at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>> >>> > at 
>> >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
>> >>> > at 
>> >>> > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> >>> > at 
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.main(MesosTaskExecutorRunner.java:105)
>> >>> > Caused by: 
>> >>> > org.apache.flink.configuration.IllegalConfigurationException: The 
>> >>> > required configuration option Key: 'taskmanager.memory.task.heap.size' 
>> >>> > , default: null (fallback keys: []) is not set
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90)
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84)
>> >>> > at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390)
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84)
>> >>> > at 
>> >>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
>> >>> > ... 9 more
>> >>> >
>> >>> > But when task manager is launched, it correctly parses process memory 
>> >>> > key:
>> >>> >
>> >>> > 2020-03-11 11:43:55,376 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - 
>> >>> > --------------------------------------------------------------------------------
>> >>> > 2020-03-11 11:43:55,377 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>> >>> > Starting MesosTaskExecutorRunner (Version: 1.10.0, Rev:aa4eb8f, 
>> >>> > Date:07.02.2020 @ 19:18:19 CET)
>> >>> > 2020-03-11 11:43:55,377 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  OS 
>> >>> > current user: root
>> >>> > 2020-03-11 11:43:57,347 [main] WARN  
>> >>> > org.apache.hadoop.util.NativeCodeLoader                       - Unable 
>> >>> > to load native-hadoop library for your platform... using builtin-java 
>> >>> > classes where applicable
>> >>> > 2020-03-11 11:43:57,535 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  JVM: 
>> >>> > OpenJDK 64-Bit Server VM - AdoptOpenJDK - 11/11.0.2+9
>> >>> > 2020-03-11 11:43:57,535 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>> >>> > Maximum heap size: 746 MiBytes
>> >>> > 2020-03-11 11:43:57,535 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>> >>> > JAVA_HOME: (not set)
>> >>> > 2020-03-11 11:43:57,539 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>> >>> > Hadoop version: 2.6.5
>> >>> > 2020-03-11 11:43:57,539 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  JVM 
>> >>> > Options:
>> >>> > 2020-03-11 11:43:57,539 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>> >>> > -Xmx781818251
>> >>> > 2020-03-11 11:43:57,539 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>> >>> > -Xms781818251
>> >>> > 2020-03-11 11:43:57,540 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>> >>> > -XX:MaxDirectMemorySize=317424929
>> >>> > 2020-03-11 11:43:57,540 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>> >>> > -XX:MaxMetaspaceSize=100663296
>> >>> > 2020-03-11 11:43:57,540 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>> >>> > -Dlog.file=/var/log/flink-session-cluster/taskmanager.log
>> >>> > 2020-03-11 11:43:57,540 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>> >>> > -Dlog4j.configuration=file:/opt/flink/conf/log4j.properties
>> >>> > 2020-03-11 11:43:57,540 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>> >>> > -Dlogback.configurationFile=file:/opt/flink/conf/logback.xml
>> >>> > 2020-03-11 11:43:57,540 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>> >>> > Program Arguments: (none)
>> >>> > 2020-03-11 11:43:57,540 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>> >>> > Classpath: 
>> >>> > /opt/flink/lib/apache-log4j-extras-1.2.17.jar:/opt/flink/lib/flink-metrics-graphite-1.10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.6.5-8.0.jar:/opt/flink/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink/lib/flink-table_2.12-1.10.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.10.0.jar:
>> >>> > 2020-03-11 11:43:57,541 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - 
>> >>> > --------------------------------------------------------------------------------
>> >>> > 2020-03-11 11:43:57,542 [main] INFO  
>> >>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - 
>> >>> > Registered UNIX signal handlers for [TERM, HUP, INT]
>> >>> > 2020-03-11 11:43:57,550 [main] INFO  
>> >>> > org.apache.flink.configuration.GlobalConfiguration            - 
>> >>> > Loading configuration property: taskmanager.memory.process.size, 2g
>> >>> > 2020-03-11 11:43:57,550 [main] INFO  
>> >>> > org.apache.flink.configuration.GlobalConfiguration            - 
>> >>> > Loading configuration property: taskmanager.cpu.cores, 2
>> >>> > 2020-03-11 11:43:57,551 [main] INFO  
>> >>> > org.apache.flink.configuration.GlobalConfiguration            - 
>> >>> > Loading configuration property: taskmanager.numberOfTaskSlots, 4
>> >>> > 2020-03-11 11:43:57,551 [main] INFO  
>> >>> > org.apache.flink.configuration.GlobalConfiguration            - 
>> >>> > Loading configuration property: parallelism.default, 1
>> >>> > ...
>> >>> >
>> >>> > Judging by the docs specifying taskmanager.memory.process.size key 
>> >>> > should be enough to launch the job, but it seems like this value is 
>> >>> > ignored.
>> >>> > I would appreciate any suggestion.
>> >>> >
>> >>> > Regards and thanks in advance,
>> >>> > Alex.

Reply via email to