Glad to hear that your issue is fixed.
I'm not sure what you suggest to add. Could you tell it more specific
or create a Jira ticket?

Best,
Yangze Guo


On Thu, Mar 12, 2020 at 3:51 PM Alexander Kasyanenko
<as.kasyane...@gmail.com> wrote:
>
> Hi Yangze, Xintong,
>
> Thank you for instant response.
>
> And big thanks for the hint on TM launch command. It indeed was the problem. 
> I've added my own custom mesos-taskmanager.sh to echo the launch command 
> (I've switched to DEBUG level on logging, but it didn't really display 
> anything useful). May I suggest to add something like this in the future 
> releases?
>
> As for my particular case, the issue was in mesos-appmaster.sh option:
>
> -Dmesos.resourcemanager.tasks.taskmanager-cmd="/opt/job/custom_launch_tm.sh"
>
> My custom launch script was slicing argument array incorrectly.
>
> Thanks for the help and regards,
> Alex.
>
> чт, 12 мар. 2020 г. в 15:46, Xintong Song <tonysong...@gmail.com>:
>>
>> Hi Alex,
>>
>> Could you try to check and post your TM launch command? I suspect that there 
>> might be some unrecognized arguments that prevent the rest of arguments 
>> being parsed.
>>
>> The TM memory configuration process works as follow:
>>
>> The resource manager will parse the configurations, checking which options 
>> are configured and which are not, and calculate the size of each memory 
>> component. (This is where ‘taskmanager.memory.process.size’ is used.)
>> After deriving the memory component sizes, the resource manager will 
>> generate launch command for the task managers, with dynamic configurations 
>> "-D <key=value>" overwriting the memory component sizes. Therefore, even you 
>> have not configured 'taskmanager.memory.task.heap.size', it is expected that 
>> before when the TM is launched this config option should be available.
>> When a task manager is started, it will not do the calculations again, and 
>> will directly read the memory component sizes calculated by resource manager 
>> from the dynamic configurations. That means it is not reading 
>> ‘taskmanager.memory.process.size’ and deriving memory component sizes from 
>> it again.
>>
>> One thing that might have caused your problem is that, when 
>> MesosTaskExecutorRunner parses the command line arguments (that's where the 
>> dynamic configurations are passed in), if it meets an unrecognized token it 
>> will stop parsing the rest of the arguments. That could be the reason that 
>> 'taskmanager.memory.task.heap.size' is missing. You can take a look at the 
>> launching command, see if there's anything unexpected before the memory 
>> dynamic configurations.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Thu, Mar 12, 2020 at 2:26 PM Yangze Guo <karma...@gmail.com> wrote:
>>>
>>> Hi, Alexander
>>>
>>> I could not reproduce it in my local environment. Normally, Mesos RM
>>> will calculate all the mem config and add it to the launch command.
>>> Unfortunately, all the log I could found for this command is at the
>>> DEBUG level. Would you mind changing the log level to DEBUG or sharing
>>> anything about the taskmanager launch command you could found in the
>>> current log?
>>>
>>>
>>> Best,
>>> Yangze Guo
>>>
>>> On Thu, Mar 12, 2020 at 1:38 PM Alexander Kasyanenko
>>> <as.kasyane...@gmail.com> wrote:
>>> >
>>> > Hi folks,
>>> >
>>> > I have a question related configuration for new memory introduced in 
>>> > flink 1.10. Has anyone encountered similar problem?
>>> > I'm trying to make use of taskmanager.memory.process.size configuration 
>>> > key in combination with mesos session cluster, but I get an error like 
>>> > this:
>>> >
>>> > 2020-03-11 11:44:09,771 [main] ERROR 
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - Error 
>>> > while starting the TaskManager
>>> > org.apache.flink.configuration.IllegalConfigurationException: Failed to 
>>> > create TaskExecutorResourceSpec
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72)
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152)
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
>>> > at 
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.lambda$main$0(MesosTaskExecutorRunner.java:106)
>>> > at java.base/java.security.AccessController.doPrivileged(Native Method)
>>> > at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>>> > at 
>>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
>>> > at 
>>> > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>> > at 
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner.main(MesosTaskExecutorRunner.java:105)
>>> > Caused by: org.apache.flink.configuration.IllegalConfigurationException: 
>>> > The required configuration option Key: 
>>> > 'taskmanager.memory.task.heap.size' , default: null (fallback keys: []) 
>>> > is not set
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90)
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84)
>>> > at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390)
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84)
>>> > at 
>>> > org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
>>> > ... 9 more
>>> >
>>> > But when task manager is launched, it correctly parses process memory key:
>>> >
>>> > 2020-03-11 11:43:55,376 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - 
>>> > --------------------------------------------------------------------------------
>>> > 2020-03-11 11:43:55,377 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Starting 
>>> > MesosTaskExecutorRunner (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ 
>>> > 19:18:19 CET)
>>> > 2020-03-11 11:43:55,377 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  OS 
>>> > current user: root
>>> > 2020-03-11 11:43:57,347 [main] WARN  
>>> > org.apache.hadoop.util.NativeCodeLoader                       - Unable to 
>>> > load native-hadoop library for your platform... using builtin-java 
>>> > classes where applicable
>>> > 2020-03-11 11:43:57,535 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  JVM: 
>>> > OpenJDK 64-Bit Server VM - AdoptOpenJDK - 11/11.0.2+9
>>> > 2020-03-11 11:43:57,535 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Maximum 
>>> > heap size: 746 MiBytes
>>> > 2020-03-11 11:43:57,535 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>>> > JAVA_HOME: (not set)
>>> > 2020-03-11 11:43:57,539 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Hadoop 
>>> > version: 2.6.5
>>> > 2020-03-11 11:43:57,539 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  JVM 
>>> > Options:
>>> > 2020-03-11 11:43:57,539 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>>> > -Xmx781818251
>>> > 2020-03-11 11:43:57,539 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>>> > -Xms781818251
>>> > 2020-03-11 11:43:57,540 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>>> > -XX:MaxDirectMemorySize=317424929
>>> > 2020-03-11 11:43:57,540 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>>> > -XX:MaxMetaspaceSize=100663296
>>> > 2020-03-11 11:43:57,540 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>>> > -Dlog.file=/var/log/flink-session-cluster/taskmanager.log
>>> > 2020-03-11 11:43:57,540 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>>> > -Dlog4j.configuration=file:/opt/flink/conf/log4j.properties
>>> > 2020-03-11 11:43:57,540 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -     
>>> > -Dlogback.configurationFile=file:/opt/flink/conf/logback.xml
>>> > 2020-03-11 11:43:57,540 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  Program 
>>> > Arguments: (none)
>>> > 2020-03-11 11:43:57,540 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     -  
>>> > Classpath: 
>>> > /opt/flink/lib/apache-log4j-extras-1.2.17.jar:/opt/flink/lib/flink-metrics-graphite-1.10.0.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.6.5-8.0.jar:/opt/flink/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink/lib/flink-table_2.12-1.10.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.10.0.jar:
>>> > 2020-03-11 11:43:57,541 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - 
>>> > --------------------------------------------------------------------------------
>>> > 2020-03-11 11:43:57,542 [main] INFO  
>>> > org.apache.flink.mesos.entrypoint.MesosTaskExecutorRunner     - 
>>> > Registered UNIX signal handlers for [TERM, HUP, INT]
>>> > 2020-03-11 11:43:57,550 [main] INFO  
>>> > org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>> > configuration property: taskmanager.memory.process.size, 2g
>>> > 2020-03-11 11:43:57,550 [main] INFO  
>>> > org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>> > configuration property: taskmanager.cpu.cores, 2
>>> > 2020-03-11 11:43:57,551 [main] INFO  
>>> > org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>> > configuration property: taskmanager.numberOfTaskSlots, 4
>>> > 2020-03-11 11:43:57,551 [main] INFO  
>>> > org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>> > configuration property: parallelism.default, 1
>>> > ...
>>> >
>>> > Judging by the docs specifying taskmanager.memory.process.size key should 
>>> > be enough to launch the job, but it seems like this value is ignored.
>>> > I would appreciate any suggestion.
>>> >
>>> > Regards and thanks in advance,
>>> > Alex.

Reply via email to