Re: Spark-Submit issues

Ted Malaska Wed, 12 Nov 2014 13:30:43 -0800

Other wish include them at the time of execution.  here is an example.

spark-submit --jars
/opt/cloudera/parcels/CDH/lib/zookeeper/zookeeper-3.4.5-cdh5.1.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/guava-12.0.1.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/protobuf-java-2.5.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop-compat.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar
--class org.apache.spark.hbase.example.HBaseBulkDeleteExample --master yarn
--deploy-mode client --executor-memory 512M --num-executors 4
--driver-java-options
-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/lib/*
SparkHBase.jar t1 c


On Wed, Nov 12, 2014 at 4:25 PM, Hari Shreedharan <hshreedha...@cloudera.com
> wrote:

> Yep, you’d need to shade jars to ensure all your dependencies are in the
> classpath.
>
> Thanks,
> Hari
>
>
> On Wed, Nov 12, 2014 at 3:23 AM, Ted Malaska <ted.mala...@cloudera.com>
> wrote:
>
>> Hey this is Ted
>>
>> Are you using Shade when you build your jar and are you using the bigger
>> jar?  Looks like classes are not included in you jar.
>>
>> On Wed, Nov 12, 2014 at 2:09 AM, Jeniba Johnson <
>> jeniba.john...@lntinfotech.com> wrote:
>>
>>> Hi Hari,
>>>
>>> Now Iam trying out the same FlumeEventCount example running with
>>> spark-submit Instead of run example. The steps I followed is that I have
>>> exported the JavaFlumeEventCount.java into jar.
>>>
>>> The command used is
>>> ./bin/spark-submit --jars lib/spark-examples-1.1.0-hadoop1.0.4.jar
>>> --master local --class org.JavaFlumeEventCount  bin/flumeeventcnt2.jar
>>> localhost 2323
>>>
>>> The output is
>>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
>>> blocks
>>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
>>> 1415795102000
>>>
>>> If I use this command
>>>  ./bin/spark-submit --master local --class org.JavaFlumeEventCount
>>> bin/flumeeventcnt2.jar  localhost 2323
>>>
>>> Then I get an error
>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>> classpath
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/examples/streaming/StreamingExamples
>>>         at org.JavaFlumeEventCount.main(JavaFlumeEventCount.java:22)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:601)
>>>         at
>>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
>>>         at
>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.spark.examples.streaming.StreamingExamples
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>         ... 8 more
>>>
>>>
>>> I Just wanted to ask is  that it is able to  find spark-assembly.jar but
>>> why not spark-example.jar.
>>> The next doubt is  while running FlumeEventCount example through
>>> runexample
>>>
>>> I get an output as
>>> Received 4 flume events.
>>>
>>> 14/11/12 18:30:14 INFO scheduler.JobScheduler: Finished job streaming
>>> job 1415797214000 ms.0 from job set of time 1415797214000 ms
>>> 14/11/12 18:30:14 INFO rdd.MappedRDD: Removing RDD 70 from persistence
>>> list
>>>
>>> But If I run the same program through Spark-Submit
>>>
>>> I get an output as
>>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
>>> blocks
>>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
>>> 1415795102000
>>>
>>> So I need a clarification, since in the program the printing statement
>>> is written as " Received n flume events." So how come Iam able to see as
>>> "Stream 0 received n blocks".
>>> And what is the difference of running the program through spark-submit
>>> and run-example.
>>>
>>> Awaiting for your kind reply
>>>
>>> Regards,
>>> Jeniba Johnson
>>>
>>>
>>> ________________________________
>>> The contents of this e-mail and any attachment(s) may contain
>>> confidential or privileged information for the intended recipient(s).
>>> Unintended recipients are prohibited from taking action on the basis of
>>> information in this e-mail and using or disseminating the information, and
>>> must notify the sender and delete it from their system. L&T Infotech will
>>> not accept responsibility or liability for the accuracy or completeness of,
>>> or the presence of any virus or disabling code in this e-mail"
>>>
>>
>>
>

Re: Spark-Submit issues

Reply via email to