Re: passing additional jvm parameters to the configuration

Georg Heiler Thu, 25 Jun 2020 08:36:26 -0700

Do you also want to answer
https://stackoverflow.com/questions/62562153/apache-flink-and-pureconfig-passing-java-properties-on-job-startup
?


Your suggestion seems to work well.

Best,
Georg

Am Do., 25. Juni 2020 um 15:32 Uhr schrieb Arvid Heise <ar...@ververica.com
>:

> You are welcome.
>
> I'm not an expert on the yarn executor but I hope that
>
>      -yt,--yarnship <arg>                 Ship files in the specified 
> directory
>                                           (t for transfer)
>
> can help [1]. Oddly this option is not given on the YARN page. But it should 
> be available as it's also used in the SSL setup [2].
>
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html
> [2] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/security-ssl.html#tips-for-yarn--mesos-deployment
>
>
> On Thu, Jun 25, 2020 at 3:23 PM Georg Heiler <georg.kf.hei...@gmail.com>
> wrote:
>
>> Thanks a lot!
>> Your point is right.
>>
>> One Cluster per job should be used in the thought model to be comparable.
>>
>> In particular for YARN:
>>
>> -yD env.java.opts="-Dconfig.file='config/jobs/twitter-analysis.conf'"
>>
>> You mentioned, that the path must be accessible. Spark has a --files 
>> parameter and then the local file is automatically copied to the root of the 
>> YARN container. Is something similar available in Flink?
>>
>>
>> Best,
>> Georg
>>
>> Am Do., 25. Juni 2020 um 14:58 Uhr schrieb Arvid Heise <
>> ar...@ververica.com>:
>>
>>> Hi Georg,
>>>
>>> I think there is a conceptual misunderstanding. If you reuse the cluster
>>> for several jobs, they need to share the JVM_ARGS since it's the same
>>> process. [1] On Spark, new processes are spawned for each stage afaik.
>>>
>>> However, the current recommendation is to use only one ad-hoc cluster
>>> per job/application (which is closer to how Spark works). So if you use
>>> YARN, every job/application spawns a new cluster that just has the right
>>> size for it. Then you can supply new parameters for new YARN submission
>>> with
>>>
>>> flink run -m yarn-cluster -yD 
>>> env.java.opts="-Dconfig.file='config/jobs/twitter-analysis.conf'" \
>>>
>>> -class com.github.geoheil.streamingreference.tweets.TweetsAnalysis \
>>> "usecases/tweets/build/libs/tweets_${SCALA_VERSION}-${VERSION}-all.jar"
>>>
>>> However, make sure that the path is accessible from within your YARN
>>> cluster, since the driver is probably executed on the cluster (not 100%
>>> sure).
>>>
>>>
>>> If you want per job level configurations on a shared cluster, I'd
>>> recommend to use normal parameters and initialize PureConfig manually
>>> (haven't used it, so not sure how). Then, you'd probably invoke your
>>> program as follows.
>>>
>>> flink run \
>>>
>>> -class com.github.geoheil.streamingreference.tweets.TweetsAnalysis \
>>> "usecases/tweets/build/libs/tweets_${SCALA_VERSION}-${VERSION}-all.jar"
>>> config.file='config/jobs/twitter-analysis.conf'
>>>
>>>
>>> For local execution, I had some trouble configuring it as well (tried it
>>> with your code). The issue is that all parameters that we previously tried
>>> are only passed to newly spawned processes while your code is directly
>>> executed in the CLI.
>>>
>>>
>>> FLINK_ENV_JAVA_OPTS=-Dconfig.file="`pwd`/config/jobs/twitter-analysis.conf"
>>> flink run
>>> -class com.github.geoheil.streamingreference.tweets.TweetsAnalysis \
>>> "usecases/tweets/build/libs/tweets_${SCALA_VERSION}-${VERSION}-all.jar"
>>>
>>> FLINK_ENV_JAVA_OPTS is usually parsed from flink-conf.yaml using the
>>> env.java.opts but doesn't respect -Denv.java.opts. I'm not sure if this
>>> is intentional.
>>>
>>>
>>> If you could put the env.java.opts in the flink-conf.yaml, it would most
>>> likely work for both YARN and local. With FLINK_CONF_DIR you can set a
>>> different conf dir per job. Alternatively, you could also specify both
>>> FLINK_ENV_JAVA_OPTS and -yD to inject the property.
>>>
>>>
>>> [1] https://stackoverflow.com/a/33855802/10299342
>>>
>>> On Thu, Jun 25, 2020 at 12:49 PM Georg Heiler <georg.kf.hei...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> but how can I change/configure it per submitted job and not for the
>>>> whole cluster?
>>>>
>>>> Best,
>>>> Georg
>>>>
>>>> Am Do., 25. Juni 2020 um 10:07 Uhr schrieb Arvid Heise <
>>>> ar...@ververica.com>:
>>>>
>>>>> Hi Georg,
>>>>>
>>>>> thank you for your detailed explanation. You want to use
>>>>> env.java.opts[1]. There are flavors if you only want to make it available
>>>>> on job manager or task manager but I guess the basic form is good enough
>>>>> for you.
>>>>>
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#jvm-and-logging-options
>>>>>
>>>>> On Wed, Jun 24, 2020 at 10:52 PM Georg Heiler <
>>>>> georg.kf.hei...@gmail.com> wrote:
>>>>>
>>>>>> Hi Arvid,
>>>>>>
>>>>>> thanks for the quick reply. I have a strong Apache spark background.
>>>>>> There, when executing on YARN or locally usually, the cluster is created
>>>>>> on-demand for the duration of the batch /streaming job.
>>>>>> There, there is only the concept of A) master/driver (application
>>>>>> master) B) slave/executor C) Driver: the node where the main class is
>>>>>> invoked. In Sparks`notion, I want the -D parameter to be available on the
>>>>>> (C) Driver node. When translating this to Flink, I want this to be
>>>>>> available to the Main class which is invoked when the job is
>>>>>> submitted/started by the job manager (which should be equivalent to the
>>>>>> driver).
>>>>>>
>>>>>> But maybe my understanding of Flink is not 100% correct yet.
>>>>>>
>>>>>> Unfortunately, using -D directly is not working.
>>>>>>
>>>>>> Best,
>>>>>> Georg
>>>>>>
>>>>>> Am Mi., 24. Juni 2020 um 22:13 Uhr schrieb Arvid Heise <
>>>>>> ar...@ververica.com>:
>>>>>>
>>>>>>> Hi Georg,
>>>>>>>
>>>>>>> could you check if simply using -D is working as described here [1].
>>>>>>>
>>>>>>> If not, could you please be more precise: do you want the parameter
>>>>>>> to be passed to the driver, the job manager, or the task managers?
>>>>>>>
>>>>>>> [1]
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html#deployment-targets
>>>>>>>
>>>>>>> On Wed, Jun 24, 2020 at 8:55 PM Georg Heiler <
>>>>>>> georg.kf.hei...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> how can I pass additional configuration parameters like spark`s
>>>>>>>> extraJavaOptions to a flink job?
>>>>>>>>
>>>>>>>>
>>>>>>>> https://stackoverflow.com/questions/62562153/apache-flink-and-pureconfig-passing-java-properties-on-job-startup
>>>>>>>>
>>>>>>>> contains the details. But the gist is:
>>>>>>>> flink run --class
>>>>>>>> com.github.geoheil.streamingreference.tweets.TweetsAnalysis \
>>>>>>>> "usecases/tweets/build/libs/tweets_${SCALA_VERSION}-${VERSION}
>>>>>>>> -all.jar" \
>>>>>>>> -yD env.java.opts="
>>>>>>>> -Dconfig.file='config/jobs/twitter-analysis.conf'"
>>>>>>>>
>>>>>>>> is not passing the -Dconfig.file to the flink job!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Georg
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Arvid Heise | Senior Java Developer
>>>>>>>
>>>>>>> <https://www.ververica.com/>
>>>>>>>
>>>>>>> Follow us @VervericaData
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>>>> Conference
>>>>>>>
>>>>>>> Stream Processing | Event Driven | Real Time
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>>>
>>>>>>> --
>>>>>>> Ververica GmbH
>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
>>>>>>> Ji (Toni) Cheng
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Arvid Heise | Senior Java Developer
>>>>>
>>>>> <https://www.ververica.com/>
>>>>>
>>>>> Follow us @VervericaData
>>>>>
>>>>> --
>>>>>
>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>> Conference
>>>>>
>>>>> Stream Processing | Event Driven | Real Time
>>>>>
>>>>> --
>>>>>
>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>
>>>>> --
>>>>> Ververica GmbH
>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
>>>>> Ji (Toni) Cheng
>>>>>
>>>>
>>>
>>> --
>>>
>>> Arvid Heise | Senior Java Developer
>>>
>>> <https://www.ververica.com/>
>>>
>>> Follow us @VervericaData
>>>
>>> --
>>>
>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>> Conference
>>>
>>> Stream Processing | Event Driven | Real Time
>>>
>>> --
>>>
>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>
>>> --
>>> Ververica GmbH
>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>> (Toni) Cheng
>>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: passing additional jvm parameters to the configuration

Reply via email to