The cluster mode doesn't upload jars to the driver node. This is a known issue: https://issues.apache.org/jira/browse/SPARK-4160
On Wed, Dec 27, 2017 at 1:27 AM, Geoff Von Allmen <ge...@ibleducation.com> wrote: > I’ve tried it both ways. > > Uber jar gives me gives me the following: > > - Caused by: java.lang.ClassNotFoundException: Failed to find data > source: kafka. Please find packages at http://spark.apache.org/third- > party-projects.html > > If I only do minimal packaging and add org.apache.spark_spark-sql- > kafka-0-10_2.11-2.2.0.jar as a --package and then add it to the > --driver-class-path then I get past that error, but I get the error I > showed in the original post. > > I agree it seems it’s missing the kafka-clients jar file as that is where > the ByteArrayDeserializer is, though it looks like it’s present as far as > I can tell. > > I can see the following two packages in the ClassPath entries on the > history server (Though the source shows: **********(redacted) — not sure > why?) > > - spark://<ip>:<port>/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar > - spark://<ip>:<port>/jars/org.apache.spark_spark-sql-kafka- > 0-10_2.11-2.2.0.jar > > As as side note, i’m running both a master and worker on the same system > just to test out running in cluster mode. Not sure if that would have > anything to do with it. I would think it would make it easier since it's > got access to all the same file system... but I'm pretty new to Spark. > > I have also read through and followed those instructions as well as many > others at this point. > > Thanks! > > > On Wed, Dec 27, 2017 at 12:56 AM, Eyal Zituny <eyal.zit...@equalum.io> > wrote: > >> Hi, >> it seems that you're missing the kafka-clients jar (and probably some >> other dependencies as well) >> how did you packaged you application jar? does it includes all the >> required dependencies (as an uber jar)? >> if it's not an uber jar you need to pass via the driver-class-path and >> the executor-class-path all the files\dirs where your dependencies can be >> found (note that those must be accessible from each node in the cluster) >> i suggest to go over the manual >> <https://spark.apache.org/docs/latest/submitting-applications.html> >> >> Eyal >> >> >> On Wed, Dec 27, 2017 at 1:08 AM, Geoff Von Allmen <ge...@ibleducation.com >> > wrote: >> >>> I am trying to deploy a standalone cluster but running into >>> ClassNotFound errors. >>> >>> I have tried a whole myriad of different approaches varying from >>> packaging all dependencies into a single JAR and using the --packages >>> and --driver-class-path options. >>> >>> I’ve got a master node started, a slave node running on the same system, >>> and am using spark submit to get the streaming job kicked off. >>> >>> Here is the error I’m getting: >>> >>> Exception in thread "main" java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at >>> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58) >>> at >>> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) >>> Caused by: java.lang.NoClassDefFoundError: >>> org/apache/kafka/common/serialization/ByteArrayDeserializer >>> at >>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376) >>> at >>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala) >>> at >>> org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323) >>> at >>> org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60) >>> at >>> org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198) >>> at >>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88) >>> at >>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88) >>> at >>> org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30) >>> at >>> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150) >>> at com.Customer.start(Customer.scala:47) >>> at com.Main$.main(Main.scala:23) >>> at com.Main.main(Main.scala) >>> ... 6 more >>> Caused by: java.lang.ClassNotFoundException: >>> org.apache.kafka.common.serialization.ByteArrayDeserializer >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> ... 18 more >>> >>> Here is the spark submit command I’m using: >>> >>> ./spark-submit \ >>> --master spark://<domain>:<port> \ >>> --files jaas.conf \ >>> --deploy-mode cluster \ >>> --driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \ >>> --conf >>> "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" >>> \ >>> --packages org.apache.spark:spark-sql-kafka-0-10_2.11 \ >>> --driver-class-path >>> ~/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.1.jar \ >>> --class <class_main> \ >>> --verbose \ >>> my_jar.jar >>> >>> I’ve tried all sorts of combinations of including different packages and >>> driver-class-path jar files. As far as I can find, the serializer should be >>> in the kafka-clients jar file, which I’ve tried including to no success. >>> >>> Pom Dependencies are as follows: >>> >>> <dependencies> >>> <dependency> >>> <groupId>org.scala-lang</groupId> >>> <artifactId>scala-library</artifactId> >>> <version>2.11.12</version> >>> </dependency> >>> <dependency> >>> <groupId>org.apache.spark</groupId> >>> <artifactId>spark-streaming-kafka-0-10_2.11</artifactId> >>> <version>2.2.1</version> >>> </dependency> >>> <dependency> >>> <groupId>org.apache.spark</groupId> >>> <artifactId>spark-core_2.11</artifactId> >>> <version>2.2.1</version> >>> </dependency> >>> <dependency> >>> <groupId>org.apache.spark</groupId> >>> <artifactId>spark-sql_2.11</artifactId> >>> <version>2.2.1</version> >>> </dependency> >>> <dependency> >>> <groupId>org.apache.spark</groupId> >>> <artifactId>spark-sql-kafka-0-10_2.11</artifactId> >>> <version>2.2.1</version> >>> </dependency> >>> <dependency> >>> <groupId>mysql</groupId> >>> <artifactId>mysql-connector-java</artifactId> >>> <version>8.0.8-dmr</version> >>> </dependency> >>> <dependency> >>> <groupId>joda-time</groupId> >>> <artifactId>joda-time</artifactId> >>> <version>2.9.9</version> >>> </dependency> >>> </dependencies> >>> >>> If I remove --deploy-mode and run it as client … it works just fine. >>> >>> Thanks Everyone - >>> >>> Geoff V. >>> >>> >> >> >