Hi,sparks,
I have a spark streaming application that is a maven project, I would like to
build it into a uber jar and run in the cluster.
I have found out two options to build the uber jar, either of them has its
shortcomings, so I would ask how you guys do it.
Thanks.
1. Use the maven shade jar, and I have marked the spark related stuff as
provided in the pom.xml, like:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
With this, looks it can build the uber jar, but when I run the application
locally, it complains that spark related stuff is missing which is not
surprising because the spark related things are marked as provided, which will
not included in runtime time
2. Instead of marking the spark things as provided, i configure the maven shade
plugin to exclude the spark things as following, but there are still many
things are there.
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>junit:junit</exclude>
<exclude>log4j:log4j:jar:</exclude>
<exclude>org.scala-lang:scala-library:jar:</exclude>
<exclude>org.apache.spark:spark-core_2.10</exclude>
<exclude>org.apache.spark:spark-sql_2.10</exclude>
<exclude>org.apache.spark:spark-streaming_2.10</exclude>
</excludes>
</artifactSet>
</configuration>
Does someone ever build uber jar for the spark application, I would like to see
how you do it, thanks!
[email protected]