Other wish include them at the time of execution. here is an example. spark-submit --jars /opt/cloudera/parcels/CDH/lib/zookeeper/zookeeper-3.4.5-cdh5.1.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/guava-12.0.1.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/protobuf-java-2.5.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop-compat.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar --class org.apache.spark.hbase.example.HBaseBulkDeleteExample --master yarn --deploy-mode client --executor-memory 512M --num-executors 4 --driver-java-options -Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/lib/* SparkHBase.jar t1 c
On Wed, Nov 12, 2014 at 4:25 PM, Hari Shreedharan <hshreedha...@cloudera.com > wrote: > Yep, you’d need to shade jars to ensure all your dependencies are in the > classpath. > > Thanks, > Hari > > > On Wed, Nov 12, 2014 at 3:23 AM, Ted Malaska <ted.mala...@cloudera.com> > wrote: > >> Hey this is Ted >> >> Are you using Shade when you build your jar and are you using the bigger >> jar? Looks like classes are not included in you jar. >> >> On Wed, Nov 12, 2014 at 2:09 AM, Jeniba Johnson < >> jeniba.john...@lntinfotech.com> wrote: >> >>> Hi Hari, >>> >>> Now Iam trying out the same FlumeEventCount example running with >>> spark-submit Instead of run example. The steps I followed is that I have >>> exported the JavaFlumeEventCount.java into jar. >>> >>> The command used is >>> ./bin/spark-submit --jars lib/spark-examples-1.1.0-hadoop1.0.4.jar >>> --master local --class org.JavaFlumeEventCount bin/flumeeventcnt2.jar >>> localhost 2323 >>> >>> The output is >>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1 >>> blocks >>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time >>> 1415795102000 >>> >>> If I use this command >>> ./bin/spark-submit --master local --class org.JavaFlumeEventCount >>> bin/flumeeventcnt2.jar localhost 2323 >>> >>> Then I get an error >>> Spark assembly has been built with Hive, including Datanucleus jars on >>> classpath >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> org/apache/spark/examples/streaming/StreamingExamples >>> at org.JavaFlumeEventCount.main(JavaFlumeEventCount.java:22) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:601) >>> at >>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) >>> at >>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> Caused by: java.lang.ClassNotFoundException: >>> org.apache.spark.examples.streaming.StreamingExamples >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >>> ... 8 more >>> >>> >>> I Just wanted to ask is that it is able to find spark-assembly.jar but >>> why not spark-example.jar. >>> The next doubt is while running FlumeEventCount example through >>> runexample >>> >>> I get an output as >>> Received 4 flume events. >>> >>> 14/11/12 18:30:14 INFO scheduler.JobScheduler: Finished job streaming >>> job 1415797214000 ms.0 from job set of time 1415797214000 ms >>> 14/11/12 18:30:14 INFO rdd.MappedRDD: Removing RDD 70 from persistence >>> list >>> >>> But If I run the same program through Spark-Submit >>> >>> I get an output as >>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1 >>> blocks >>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time >>> 1415795102000 >>> >>> So I need a clarification, since in the program the printing statement >>> is written as " Received n flume events." So how come Iam able to see as >>> "Stream 0 received n blocks". >>> And what is the difference of running the program through spark-submit >>> and run-example. >>> >>> Awaiting for your kind reply >>> >>> Regards, >>> Jeniba Johnson >>> >>> >>> ________________________________ >>> The contents of this e-mail and any attachment(s) may contain >>> confidential or privileged information for the intended recipient(s). >>> Unintended recipients are prohibited from taking action on the basis of >>> information in this e-mail and using or disseminating the information, and >>> must notify the sender and delete it from their system. L&T Infotech will >>> not accept responsibility or liability for the accuracy or completeness of, >>> or the presence of any virus or disabling code in this e-mail" >>> >> >> >