Hi Jason,

I hope you don't mind that I brought back the conversation to the user@
mailing list, so that others can benefit from the information as well.

Thanks a lot for sharing your use case. I personally believe that Flink
should support invocations like "flink run -m yarn-cluster
xxx.FlinkStreamSQLDDLJob flink-stream-sql-ddl-1.0.0.jar ./config.json".
There is no fundamental reason why this can not be supported.

The Javadoc about tableEnv.getConfig() mentions that the config is only
about the "runtime behavior":
https://github.com/apache/flink/blob/master/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/TableEnvironment.java#L1151
... but I see how this is not clearly defined.

As a short-term fix, I've proposed to clarify in the configuration table
which options are cluster vs job configurations:
https://issues.apache.org/jira/browse/FLINK-22257.

But in the long term, we certainly need to improve the user experience.


On Wed, Jun 16, 2021 at 3:31 PM Jason Lee <jasonlee1...@163.com> wrote:

> Dear Robert,
>
> For tasks running on the cluster, some parameter configurations are
> global, but some parameter configurations need to be customized, such as
> some memory settings of TaskManager. For tasks with different state sizes,
> we need to configure different parameters. These parameters should not  be
> configured in flink-config.yaml. But for the current Flink, these
> parameters cannot be configured through StreamExecutionEnvironment, and
> some parameters are not effective if set through StreamTableEnvironment.
>
> At the same time, Configuration is immutable after the task is started,
> which is correct, but I think some global parameters should also be
> specified in StreamExecutionEnvironment. At present, some parameters of
> checkpoint are also set globally, but they can be set through
> "StreamExecutionEnvironment .getCheckpointConfig().set()", then why can't
> the parameters of TaskManager's memory be set in this way? I think that
> setting the global parameters by "flink run -yD" is the same as setting by
> "StreamExecutionEnvironment". I am not sure if I understand it correctly.
>
> I agree with you. I think we need to specify in the configuration of the
> official document that those parameters are best configured in
> flink-config.yaml. Those parameters can be modified in
> "StreamExecutionEnvironment", and those can only be passed through others
> Modified in the way. I think the document will be clearer for users.
>
> Best,
> Jason
> JasonLee1781
> jasonlee1...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=JasonLee1781&uid=jasonlee1781%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22jasonlee1781%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 06/16/2021 21:04,Jason Lee<jasonlee1...@163.com> <jasonlee1...@163.com>
> wrote:
>
> Dear Robert,
>
> Thanks for your answer
>
> Our Flink SQL task is deployed through Per job.
>
> We provide our users with a platform for developing Flink SQL tasks. We
> will write the user's SQL code and configuration parameters into a
> Config.json file. At the same time, we develop a Flink Jar task at the
> bottom to actually execute the user's SQL through the command line. To
> perform this task, for example, the following is our instruction to start a
> Flink SQL task: "flink run -m yarn-cluster xxx.FlinkStreamSQLDDLJob
> flink-stream-sql-ddl-1.0.0.jar ./config.json". In order to facilitate the
> user's personalized configuration parameters, we want to set user
> configuration parameters in the execution environment of the
> FlinkStreamSQLDDLJob class that we have implemented, such as the
> "taskmanager.memory.managed.fraction" parameter, but it is currently
> impossible to configure through the Flink execution environment These
> parameters, because they are not effective, can only be configured by flink
> run -yD.
>
> I think the configuration in the official document states that those
> parameters cannot be set through
> "StreamTableEnvironment.getConfig.getConfiguration().set()", but can only
> be set through flink run -yD or configured in flink-conf.yaml. If the
> current document does not explain it, it will not take effect if you use
> the "StreamTableEnvironment.getConfig.getConfiguration().set()" method to
> set some parameters. In order to increase the use of personalized
> configuration parameters for users, I think these instructions can appear
> in the Configuration of the official document.
>
> Best,
> Jason
>
> JasonLee1781
> jasonlee1...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=JasonLee1781&uid=jasonlee1781%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22jasonlee1781%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 06/16/2021 18:37,Robert Metzger<rmetz...@apache.org>
> <rmetz...@apache.org> wrote:
>
> Hi Jason,
>
> How are you deploying your Flink SQL tasks? (are you using
> per-job/application clusters, or a session cluster? )
>
> I agree that the configuration management is not optimal in Flink. By
> default, I would recommend assuming that all configuration parameters are
> cluster settings, which require a cluster restart. Very few options (mostly
> those listed in the "Execution" section) are job settings, which can be set
> for each job.
>
> Would it help if the table of configuration options in the documentation
> would tag the configuration option (with "Cluster" and "Job" option types?)?
> Secondly, the API should probably only expose an immutable Configuration
> object, if the configuration is effectively immutable. I believe the option
> to set configuration on the (Stream)(Table)Environment is mostly there for
> local execution of Flink.
>
>
> 2. I agree, the docs are incomplete here (probably another symptom of the
> fact that the whole configuration management in Flink is not optimal). I
> see what I can do to improve the situation.
>
> 3. Except for local execution (everything runs in one JVM), I don't think
> we'll add support for this anytime soon. Some of the cluster configuration
> parameters just have to be global (like memory management), as they apply
> to all jobs executed on a cluster.
>
>
> This ticket could be related to your problems:
> https://issues.apache.org/jira/browse/FLINK-21065
>
> Let us know how you are deploying your Flink jobs, this will shed some
> more light on the discussion!
>
> Best,
> Robert
>
>
> On Wed, Jun 16, 2021 at 4:27 AM Jason Lee <jasonlee1...@163.com> wrote:
>
>> Hi  everyone,
>>
>> When I was researching and using Flink recently, I found that the
>> official documentation on how to configure parameters is confusing, and
>> when I set the parameters in some ways, it does not take effect. mainly as
>> follows:
>>
>> we usually use a DDL Jar package to execute Flink SQL tasks, but we found
>> that some parameters are set by
>> StreamTableEnvironment.getConfig().getConfiguration().setXXX(key, value).
>> These parameters cannot take effect. For example,
>> taskmanager.memory.managed.fraction cannot take effect if the parameter is
>> set in the above way (the Note in TableConfig in the source code is as
>> follows: Because options are read at different point in time when
>> performing operations, it is recommended to set configuration options early
>> after instantiating a table environment. ). And
>> StreamExecutionEnvironment.getConfiguration() is protected, which leads to
>> some parameters that cannot be set through the api. I feel that this is not
>> reasonable. Because sometimes, we want to configure different parameters
>> for different tasks in the form of Configuration.setxxx(key, value) in the
>> api, instead of just configuring parameters through flink run -yD or
>> flink-conf.yaml.
>>
>> In the Configuration module of the official document, the description and
>> default value of each parameter are introduced. There is no relevant
>> introduction about the parameter setting method in the official document
>> Configuration module. I think this is not friendly enough for users,
>> especially users who want to personalize some parameters. I feel that this
>> method can be described in the official document.
>>
>> In summary, for some normal tasks we can use the default parameter
>> configuration, but for some tasks that require personalized configuration,
>> especially Flink SQL tasks, I have a few suggestions on the use of
>> configuration:
>>
>> 1. Regarding the api, I think that
>> StreamTableEnvironment.getConfig().getConfiguration().setXXX(key, value)
>> configures parameters in this way. It should be separately explained, which
>> parameters are not effective if configured in this way, otherwise, Some
>> parameters configured in this way will not take effect, which will cause
>> confusion for users.
>>
>> 2. In the official document, I think it is necessary to add instructions
>> on how to configure these parameters. For example, it can be configured not
>> only in flink-conf.yaml, but also in the running command through flink run
>> -yD, or whether there are other The parameters can be configured in the
>> mode.
>>
>> 3. Questions about StreamExecutionEnvironment.getConfiguration() being
>> protected. Will the community develop in later versions? Is there any
>> effective way for users to set some parameters in the api and make them
>> effective, such as configuring the taskmanager.memory.managed.fraction
>> parameter.
>>
>> Regarding some of the above issues, and why the parameter setting will
>> not take effec. Maybe I did not describe it clearly enough, or because I
>> did not understand the problem clearly, I hope to get a reply and discuss
>> from the community.
>>
>> Best,
>> Jason
>>
>> 李闯
>> jasonlee1...@163.com
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=%E6%9D%8E%E9%97%AF&uid=jasonlee1781%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22jasonlee1781%40163.com%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>>

Reply via email to