Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

Yik San Chan Fri, 09 Apr 2021 01:56:08 -0700

Thank you Till!

On Fri, Apr 9, 2021 at 4:25 PM Till Rohrmann <trohrm...@apache.org> wrote:


> Hi Yik San,
>
> (1) You could do the same with Kafka. For Hive I believe that the
> dependency is simply quite large so that it hurts more if you bundle it
> with your user code.
>
> (2) If you change the content in the lib directory, then you have to
> restart the cluster.
>
> Cheers,
> Till
>
> On Fri, Apr 9, 2021 at 4:02 AM Yik San Chan <evan.chanyik...@gmail.com>
> wrote:
>
>> Hi Till, I have 2 follow-ups.
>>
>> (1) Why is Hive special, while for connectors such as kafka, the docs
>> suggest simply bundling the kafka connector dependency with my user code?
>>
>> (2) it seems the document misses the "before you start the cluster" part
>> - does it always require a cluster restart whenever the /lib directory
>> changes?
>>
>> Thanks.
>>
>> Best,
>> Yik San
>>
>> On Fri, Apr 9, 2021 at 1:07 AM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> Hi Yik San,
>>>
>>> for future reference, I copy my answer from the SO here:
>>>
>>> The reason for this difference is that for Hive it is recommended to
>>> start the cluster with the respective Hive dependencies. The documentation
>>> [1] states that it's best to put the dependencies into the lib directory
>>> before you start the cluster. That way the cluster is enabled to run jobs
>>> which use Hive. At the same time, you don't have to bundle this dependency
>>> in the user jar which reduces its size. However, there shouldn't be
>>> anything preventing you from bundling the Hive dependency with your user
>>> code if you want to.
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#dependencies
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Apr 8, 2021 at 11:41 AM Yik San Chan <evan.chanyik...@gmail.com>
>>> wrote:
>>>
>>>> The question is cross-posted on Stack Overflow
>>>> https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de
>>>> .
>>>>
>>>> ## Connector dependencies should be in default scope
>>>>
>>>> This is what [flink-quickstart-scala](
>>>> https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84)
>>>> suggests:
>>>>
>>>> ```
>>>> <!-- Add connector dependencies here. They must be in the default scope
>>>> (compile). -->
>>>>
>>>> <!-- Example:
>>>> <dependency>
>>>> <groupId>org.apache.flink</groupId>
>>>> <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
>>>> <version>${flink.version}</version>
>>>> </dependency>
>>>> -->
>>>> ```
>>>>
>>>> It also aligns with [Flink project configuration](
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies
>>>> ):
>>>>
>>>> > We recommend packaging the application code and all its required
>>>> dependencies into one jar-with-dependencies which we refer to as the
>>>> application jar. The application jar can be submitted to an already running
>>>> Flink cluster, or added to a Flink application container image.
>>>> >
>>>> > Important: For Maven (and other build tools) to correctly package the
>>>> dependencies into the application jar, these application dependencies must
>>>> be specified in scope compile (unlike the core dependencies, which must be
>>>> specified in scope provided).
>>>>
>>>> ## Hive connector dependencies should be in provided scope
>>>>
>>>> However, [Flink Hive Integration docs](
>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven)
>>>> suggests the opposite:
>>>>
>>>> > If you are building your own program, you need the following
>>>> dependencies in your mvn file. It’s recommended not to include these
>>>> dependencies in the resulting jar file. You’re supposed to add dependencies
>>>> as stated above at runtime.
>>>>
>>>> ## Why?
>>>>
>>>> Thanks!
>>>>
>>>> Best,
>>>> Yik San
>>>>
>>>

Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

Reply via email to