One of the packages just contains the streaming-kafka code. The other contains that code, plus everything it depends on. That's what "assembly" typically means in JVM land.
Java/Scala users are accustomed to using their own build tool to include necessary dependencies. JVM dependency management is (thankfully) different from Python dependency management. As far as I can tell, there is no core issue, upstream or otherwise. On Tue, May 12, 2015 at 11:39 AM, Lee McFadden <[email protected]> wrote: > Thanks again for all the help folks. > > I can confirm that simply switching to `--packages > org.apache.spark:spark-streaming-kafka-assembly_2.10:1.3.1` makes > everything work as intended. > > I'm not sure what the difference is between the two packages honestly, or > why one should be used over the other, but the documentation is currently > not intuitive in this matter. If you follow the instructions, initially it > will seem broken. Is there any reason why the docs for Python users (or, > in fact, all users - Java/Scala users will run into this too except they > are armed with the ability to build their own jar with the dependencies > included) should not be changed to using the assembly package by default? > > Additionally, after a few google searches yesterday combined with your > help I'm wondering if the core issue is upstream in Kafka's dependency > chain? > > On Tue, May 12, 2015 at 8:53 AM Ted Yu <[email protected]> wrote: > >> bq. it is already in the assembly >> >> Yes. Verified: >> >> $ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar | grep >> yammer | grep Gauge >> 1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class >> >> >> On Tue, May 12, 2015 at 8:05 AM, Sean Owen <[email protected]> wrote: >> >>> It doesn't depend directly on yammer metrics; Kafka does. It wouldn't >>> be correct to declare that it does; it is already in the assembly >>> anyway. >>> >>> On Tue, May 12, 2015 at 3:50 PM, Ted Yu <[email protected]> wrote: >>> > Currently external/kafka/pom.xml doesn't cite yammer metrics as >>> dependency. >>> > >>> > $ ls -l >>> > >>> ~/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar >>> > -rw-r--r-- 1 tyu staff 82123 Dec 17 2013 >>> > >>> /Users/tyu/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar >>> > >>> > Including the metrics-core jar would not increase the size of the final >>> > release artifact much. >>> > >>> > My two cents. >>> >> >>
