The cluster mode doesn't upload jars to the driver node. This is a known
issue: https://issues.apache.org/jira/browse/SPARK-4160

On Wed, Dec 27, 2017 at 1:27 AM, Geoff Von Allmen <ge...@ibleducation.com>
wrote:

> I’ve tried it both ways.
>
> Uber jar gives me gives me the following:
>
>    - Caused by: java.lang.ClassNotFoundException: Failed to find data
>    source: kafka. Please find packages at http://spark.apache.org/third-
>    party-projects.html
>
> If I only do minimal packaging and add org.apache.spark_spark-sql-
> kafka-0-10_2.11-2.2.0.jar as a --package and then add it to the
> --driver-class-path then I get past that error, but I get the error I
> showed in the original post.
>
> I agree it seems it’s missing the kafka-clients jar file as that is where
> the ByteArrayDeserializer is, though it looks like it’s present as far as
> I can tell.
>
> I can see the following two packages in the ClassPath entries on the
> history server (Though the source shows: **********(redacted) — not sure
> why?)
>
>    - spark://<ip>:<port>/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
>    - spark://<ip>:<port>/jars/org.apache.spark_spark-sql-kafka-
>    0-10_2.11-2.2.0.jar
>
> As as side note, i’m running both a master and worker on the same system
> just to test out running in cluster mode. Not sure if that would have
> anything to do with it. I would think it would make it easier since it's
> got access to all the same file system... but I'm pretty new to Spark.
>
> I have also read through and followed those instructions as well as many
> others at this point.
>
> Thanks!
> ​
>
> On Wed, Dec 27, 2017 at 12:56 AM, Eyal Zituny <eyal.zit...@equalum.io>
> wrote:
>
>> Hi,
>> it seems that you're missing the kafka-clients jar (and probably some
>> other dependencies as well)
>> how did you packaged you application jar? does it includes all the
>> required dependencies (as an uber jar)?
>> if it's not an uber jar you need to pass via the driver-class-path and
>> the executor-class-path all the files\dirs where your dependencies can be
>> found (note that those must be accessible from each node in the cluster)
>> i suggest to go over the manual
>> <https://spark.apache.org/docs/latest/submitting-applications.html>
>>
>> Eyal
>>
>>
>> On Wed, Dec 27, 2017 at 1:08 AM, Geoff Von Allmen <ge...@ibleducation.com
>> > wrote:
>>
>>> I am trying to deploy a standalone cluster but running into
>>> ClassNotFound errors.
>>>
>>> I have tried a whole myriad of different approaches varying from
>>> packaging all dependencies into a single JAR and using the --packages
>>> and --driver-class-path options.
>>>
>>> I’ve got a master node started, a slave node running on the same system,
>>> and am using spark submit to get the streaming job kicked off.
>>>
>>> Here is the error I’m getting:
>>>
>>> Exception in thread "main" java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>     at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>>     at 
>>> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
>>>     at 
>>> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>>> Caused by: java.lang.NoClassDefFoundError: 
>>> org/apache/kafka/common/serialization/ByteArrayDeserializer
>>>     at 
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
>>>     at 
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
>>>     at 
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
>>>     at 
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
>>>     at 
>>> org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
>>>     at 
>>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
>>>     at 
>>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
>>>     at 
>>> org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
>>>     at 
>>> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
>>>     at com.Customer.start(Customer.scala:47)
>>>     at com.Main$.main(Main.scala:23)
>>>     at com.Main.main(Main.scala)
>>>     ... 6 more
>>> Caused by: java.lang.ClassNotFoundException: 
>>> org.apache.kafka.common.serialization.ByteArrayDeserializer
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>     ... 18 more
>>>
>>> Here is the spark submit command I’m using:
>>>
>>> ./spark-submit \
>>>     --master spark://<domain>:<port> \
>>>     --files jaas.conf \
>>>     --deploy-mode cluster \
>>>     --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
>>>     --conf 
>>> "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf"
>>>  \
>>>     --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \
>>>     --driver-class-path 
>>> ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \
>>>     --class <class_main> \
>>>     --verbose \
>>>     my_jar.jar
>>>
>>> I’ve tried all sorts of combinations of including different packages and
>>> driver-class-path jar files. As far as I can find, the serializer should be
>>> in the kafka-clients jar file, which I’ve tried including to no success.
>>>
>>> Pom Dependencies are as follows:
>>>
>>>     <dependencies>
>>>         <dependency>
>>>             <groupId>org.scala-lang</groupId>
>>>             <artifactId>scala-library</artifactId>
>>>             <version>2.11.12</version>
>>>         </dependency>
>>>         <dependency>
>>>             <groupId>org.apache.spark</groupId>
>>>             <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
>>>             <version>2.2.1</version>
>>>         </dependency>
>>>         <dependency>
>>>             <groupId>org.apache.spark</groupId>
>>>             <artifactId>spark-core_2.11</artifactId>
>>>             <version>2.2.1</version>
>>>         </dependency>
>>>         <dependency>
>>>             <groupId>org.apache.spark</groupId>
>>>             <artifactId>spark-sql_2.11</artifactId>
>>>             <version>2.2.1</version>
>>>         </dependency>
>>>         <dependency>
>>>             <groupId>org.apache.spark</groupId>
>>>             <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
>>>             <version>2.2.1</version>
>>>         </dependency>
>>>         <dependency>
>>>             <groupId>mysql</groupId>
>>>             <artifactId>mysql-connector-java</artifactId>
>>>             <version>8.0.8-dmr</version>
>>>         </dependency>
>>>         <dependency>
>>>             <groupId>joda-time</groupId>
>>>             <artifactId>joda-time</artifactId>
>>>             <version>2.9.9</version>
>>>         </dependency>
>>>     </dependencies>
>>>
>>> If I remove --deploy-mode and run it as client … it works just fine.
>>>
>>> Thanks Everyone -
>>>
>>> Geoff V.
>>> ​
>>>
>>
>>
>

Reply via email to