[ https://issues.apache.org/jira/browse/FLINK-23952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404845#comment-17404845 ]
Xintong Song edited comment on FLINK-23952 at 8/26/21, 2:32 AM: ---------------------------------------------------------------- h3. Why it worked fine in 1.13.1 but not in 1.13.2 It is designed that the cpu cores and all memory sizes should be calculated before starting the java process, and they should be explicitly set via configuration options. Notice that this could overwrite existing configurations. E.g., the user may configure a [min, max] range for the network memory size, and Flink's automatic calculation logic should decide a specific value within that range and set both min/max config options to that value, making sure it stays consistent during the entire lifecycle of the process. There are internally logics inside the task manager that rely on the assumption that all cpu/memory config options should be explicitly set. E.g., Flink uses the min value from configuration as the network memory size, expecting max should be configured to the same value. However, Flink did not check whether all such options are explicitly configured. That explains how your scripts worked fine in 1.13.1. Despite no serious problems were observed, the memory management may not worked as designed/expected, in terms of stability and resource efficiency. h3. Running flink with custom scripts If the build-in scripts do not satisfy your demands, it should work calling BashJavaUtils from your custom scripts. The key point is to calculate and configure the resources in advance and consistently as the other flink components expect. However, as [~chesnay] mentioned, there's no guarantee that these things will stay compatible in future releases. They can be changed anytime without notice, which means you may run into these kind of problems again in future. Alternatively, you may consider to file jira tickets for your complaints about the build-in scripts. That would be an appreciated contribution for the community. BTW, I think taskmanager.sh does not need to read FLINK_PLUGINS_DIR, because it is exported as an environment variable and is read directly by the java process. was (Author: xintongsong): h2. Why it worked fine in 1.13.1 but not in 1.13.2 It is designed that the cpu cores and all memory sizes should be calculated before starting the java process, and they should be explicitly set via configuration options. Notice that this could overwrite existing configurations. E.g., the user may configure a [min, max] range for the network memory size, and Flink's automatic calculation logic should decide a specific value within that range and set both min/max config options to that value, making sure it stays consistent during the entire lifecycle of the process. There are internally logics inside the task manager that rely on the assumption that all cpu/memory config options should be explicitly set. E.g., Flink uses the min value from configuration as the network memory size, expecting max should be configured to the same value. However, Flink did not check whether all such options are explicitly configured. That explains how your scripts worked fine in 1.13.1. Despite no serious problems were observed, the memory management may not worked as designed/expected, in terms of stability and resource efficiency. h2. Running flink with custom scripts If the build-in scripts do not satisfy your demands, it should work calling BashJavaUtils from your custom scripts. The key point is to calculate and configure the resources in advance and consistently as the other flink components expect. However, as [~chesnay] mentioned, there's no guarantee that these things will stay compatible in future releases. They can be changed anytime without notice, which means you may run into these kind of problems again in future. Alternatively, you may consider to file jira tickets for your complaints about the build-in scripts. That would be an appreciated contribution for the community. BTW, I think taskmanager.sh does not need to read FLINK_PLUGINS_DIR, because it is exported as an environment variable and is read directly by the java process. > Taskmanager fails to start complaining about missing configuration option > ------------------------------------------------------------------------- > > Key: FLINK-23952 > URL: https://issues.apache.org/jira/browse/FLINK-23952 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration > Affects Versions: 1.13.2 > Reporter: Leonid Ilyevsky > Priority: Major > Attachments: flink-conf.yaml, taskmanager.log, taskmanager_start.txt > > > Taskmanager now fails to start, after I upgraded to 1.13.2. It worked fine in > 1.13.1. > It suddenly started complaining about missing configuration options that are > not really required, according to documentation. When I tried to set the one > it complained about, it started complaining about another one. > > Please see attached files: > taskmanager_start.txt - actual command that is used to start the program > flink-conf.yaml - configuration file > taskmanager.log - logfile where you can see the exception > -- This message was sent by Atlassian Jira (v8.3.4#803005)