The question is really whether all the third-party integrations should
be built into Spark's main assembly. I think reasonable people could
disagree, but I think the current state (not built in) is reasonable.
It means you have to bring the integration with you.

That is, no, third-party queue integrations aren't built in out of the box.

the way you got it to work is one way, but not the preferred way:
build this into your app and your packaging tool would have resolved
the dependencies.

I agree with resolving this as basically working-as-intended.

On Tue, May 12, 2015 at 3:19 AM, Lee McFadden <[email protected]> wrote:
> I opened a ticket on this (without posting here first - bad etiquette,
> apologies) which was closed as 'fixed'.
>
> https://issues.apache.org/jira/browse/SPARK-7538
>
> I don't believe that because I have my script running means this is fixed, I
> think it is still an issue.
>
> I downloaded the spark source, ran `mvn -DskipTests clean package `, then
> simply launched my python script (which shouldn't be introducing additional
> *java* dependencies itself?).
>
> Doesn't this mean these dependencies are missing from the spark build, since
> I didn't modify any files within the distribution and my application itself
> can't be introducing java dependency clashes?
>
>
> On Mon, May 11, 2015, 4:34 PM Lee McFadden <[email protected]> wrote:
>>
>> Ted, many thanks.  I'm not used to Java dependencies so this was a real
>> head-scratcher for me.
>>
>> Downloading the two metrics packages from the maven repository
>> (metrics-core, metrics-annotation) and supplying it on the spark-submit
>> command line worked.
>>
>> My final spark-submit for a python project using Kafka as an input source:
>>
>> /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \
>>     --packages
>> TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1
>> \
>>     --jars
>> /home/ubuntu/jars/metrics-core-2.2.0.jar,/home/ubuntu/jars/metrics-annotation-2.2.0.jar
>> \
>>     --conf
>> spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \
>>     --master spark://127.0.0.1:7077 \
>>     affected_hosts.py
>>
>> Now we're seeing data from the stream.  Thanks again!
>>
>> On Mon, May 11, 2015 at 2:43 PM Sean Owen <[email protected]> wrote:
>>>
>>> Ah yes, the Kafka + streaming code isn't in the assembly, is it? you'd
>>> have to provide it and all its dependencies with your app. You could
>>> also build this into your own app jar. Tools like Maven will add in
>>> the transitive dependencies.
>>>
>>> On Mon, May 11, 2015 at 10:04 PM, Lee McFadden <[email protected]>
>>> wrote:
>>> > Thanks Ted,
>>> >
>>> > The issue is that I'm using packages (see spark-submit definition) and
>>> > I do
>>> > not know how to add com.yammer.metrics:metrics-core to my classpath so
>>> > Spark
>>> > can see it.
>>> >
>>> > Should metrics-core not be part of the
>>> > org.apache.spark:spark-streaming-kafka_2.10:1.3.1 package so it can
>>> > work
>>> > correctly?
>>> >
>>> > If not, any clues as to how I can add metrics-core to my project
>>> > (bearing in
>>> > mind that I'm using Python, not a JVM language) would be much
>>> > appreciated.
>>> >
>>> > Thanks, and apologies for my newbness with Java/Scala.
>>> >
>>> > On Mon, May 11, 2015 at 1:42 PM Ted Yu <[email protected]> wrote:
>>> >>
>>> >> com.yammer.metrics.core.Gauge is in metrics-core jar
>>> >> e.g., in master branch:
>>> >> [INFO] |  \- org.apache.kafka:kafka_2.10:jar:0.8.1.1:compile
>>> >> [INFO] |     +- com.yammer.metrics:metrics-core:jar:2.2.0:compile
>>> >>
>>> >> Please make sure metrics-core jar is on the classpath.
>>> >>
>>> >> On Mon, May 11, 2015 at 1:32 PM, Lee McFadden <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> We've been having some issues getting spark streaming running
>>> >>> correctly
>>> >>> using a Kafka stream, and we've been going around in circles trying
>>> >>> to
>>> >>> resolve this dependency.
>>> >>>
>>> >>> Details of our environment and the error below, if anyone can help
>>> >>> resolve this it would be much appreciated.
>>> >>>
>>> >>> Submit command line:
>>> >>>
>>> >>> /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \
>>> >>>     --packages
>>> >>>
>>> >>> TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1
>>> >>> \
>>> >>>     --conf
>>> >>>
>>> >>> spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79
>>> >>>  \
>>> >>>     --master spark://127.0.0.1:7077 \
>>> >>>     affected_hosts.py
>>> >>>
>>> >>> When we run the streaming job everything starts just fine, then we
>>> >>> see
>>> >>> the following in the logs:
>>> >>>
>>> >>> 15/05/11 19:50:46 WARN TaskSetManager: Lost task 0.0 in stage 2.0
>>> >>> (TID
>>> >>> 70, ip-10-10-102-53.us-west-2.compute.internal):
>>> >>> java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
>>> >>>         at
>>> >>>
>>> >>> kafka.consumer.ZookeeperConsumerConnector.createFetcher(ZookeeperConsumerConnector.scala:151)
>>> >>>         at
>>> >>>
>>> >>> kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:115)
>>> >>>         at
>>> >>>
>>> >>> kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:128)
>>> >>>         at
>>> >>> kafka.consumer.Consumer$.create(ConsumerConnector.scala:89)
>>> >>>         at
>>> >>>
>>> >>> org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
>>> >>>         at
>>> >>>
>>> >>> org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)
>>> >>>         at
>>> >>>
>>> >>> org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
>>> >>>         at
>>> >>>
>>> >>> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:298)
>>> >>>         at
>>> >>>
>>> >>> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:290)
>>> >>>         at
>>> >>>
>>> >>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
>>> >>>         at
>>> >>>
>>> >>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
>>> >>>         at
>>> >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> >>>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>> >>>         at
>>> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>> >>>         at
>>> >>>
>>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> >>>         at
>>> >>>
>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> >>>         at java.lang.Thread.run(Thread.java:745)
>>> >>> Caused by: java.lang.ClassNotFoundException:
>>> >>> com.yammer.metrics.core.Gauge
>>> >>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>> >>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>> >>>         at java.security.AccessController.doPrivileged(Native Method)
>>> >>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>> >>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> >>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> >>>         ... 17 more
>>> >>>
>>> >>>
>>> >>
>>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to