Thank you Till! On Fri, Apr 9, 2021 at 4:25 PM Till Rohrmann <trohrm...@apache.org> wrote:
> Hi Yik San, > > (1) You could do the same with Kafka. For Hive I believe that the > dependency is simply quite large so that it hurts more if you bundle it > with your user code. > > (2) If you change the content in the lib directory, then you have to > restart the cluster. > > Cheers, > Till > > On Fri, Apr 9, 2021 at 4:02 AM Yik San Chan <evan.chanyik...@gmail.com> > wrote: > >> Hi Till, I have 2 follow-ups. >> >> (1) Why is Hive special, while for connectors such as kafka, the docs >> suggest simply bundling the kafka connector dependency with my user code? >> >> (2) it seems the document misses the "before you start the cluster" part >> - does it always require a cluster restart whenever the /lib directory >> changes? >> >> Thanks. >> >> Best, >> Yik San >> >> On Fri, Apr 9, 2021 at 1:07 AM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >>> Hi Yik San, >>> >>> for future reference, I copy my answer from the SO here: >>> >>> The reason for this difference is that for Hive it is recommended to >>> start the cluster with the respective Hive dependencies. The documentation >>> [1] states that it's best to put the dependencies into the lib directory >>> before you start the cluster. That way the cluster is enabled to run jobs >>> which use Hive. At the same time, you don't have to bundle this dependency >>> in the user jar which reduces its size. However, there shouldn't be >>> anything preventing you from bundling the Hive dependency with your user >>> code if you want to. >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#dependencies >>> >>> Cheers, >>> Till >>> >>> On Thu, Apr 8, 2021 at 11:41 AM Yik San Chan <evan.chanyik...@gmail.com> >>> wrote: >>> >>>> The question is cross-posted on Stack Overflow >>>> https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de >>>> . >>>> >>>> ## Connector dependencies should be in default scope >>>> >>>> This is what [flink-quickstart-scala]( >>>> https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84) >>>> suggests: >>>> >>>> ``` >>>> <!-- Add connector dependencies here. They must be in the default scope >>>> (compile). --> >>>> >>>> <!-- Example: >>>> <dependency> >>>> <groupId>org.apache.flink</groupId> >>>> <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId> >>>> <version>${flink.version}</version> >>>> </dependency> >>>> --> >>>> ``` >>>> >>>> It also aligns with [Flink project configuration]( >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies >>>> ): >>>> >>>> > We recommend packaging the application code and all its required >>>> dependencies into one jar-with-dependencies which we refer to as the >>>> application jar. The application jar can be submitted to an already running >>>> Flink cluster, or added to a Flink application container image. >>>> > >>>> > Important: For Maven (and other build tools) to correctly package the >>>> dependencies into the application jar, these application dependencies must >>>> be specified in scope compile (unlike the core dependencies, which must be >>>> specified in scope provided). >>>> >>>> ## Hive connector dependencies should be in provided scope >>>> >>>> However, [Flink Hive Integration docs]( >>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven) >>>> suggests the opposite: >>>> >>>> > If you are building your own program, you need the following >>>> dependencies in your mvn file. It’s recommended not to include these >>>> dependencies in the resulting jar file. You’re supposed to add dependencies >>>> as stated above at runtime. >>>> >>>> ## Why? >>>> >>>> Thanks! >>>> >>>> Best, >>>> Yik San >>>> >>>