Hm OK I am crazy then. I think I never noticed it because I had always used a distro that did actually supply this on the classpath. Well ... I think it would be reasonable to include these things (at least, Kafka integration) by default in the binary distro. I'll update the JIRA to reflect that this is at best a Wish.
On Sat, Aug 4, 2018 at 4:17 PM Jacek Laskowski <ja...@japila.pl> wrote: > Hi Sean, > > It's been for years I'd say that you had to specify --packages to get the > Kafka-related jars on the classpath. I simply got used to this annoyance > (as did others). Could it be that it's an external package (although an > integral part of Spark)?! > > I'm very glad you've brought it up since I think Kafka data source is so > important that it should be included in spark-shell and spark-submit by > default. THANKS! > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > Mastering Spark SQL https://bit.ly/mastering-spark-sql > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams > Follow me at https://twitter.com/jaceklaskowski > > On Sat, Aug 4, 2018 at 9:56 PM, Sean Owen <sro...@gmail.com> wrote: > >> Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 -- >> I provisionally marked this a Blocker, as if it's correct, then the release >> is missing an important piece and we'll want to remedy that ASAP. I still >> have this feeling I am missing something. The classes really aren't there >> in the release but ... *nobody* noticed all this time? I guess maybe >> Spark-Kafka users may be using a vendor distro that does package these bits. >> >> >> On Sat, Aug 4, 2018 at 10:48 AM Sean Owen <sro...@gmail.com> wrote: >> >>> I was debugging why a Kafka-based streaming app doesn't seem to find >>> Kafka-related integration classes when run standalone from our latest 2.3.1 >>> release, and noticed that there doesn't seem to be any Kafka-related jars >>> from Spark in the distro. In jars/, I see: >>> >>> spark-catalyst_2.11-2.3.1.jar >>> spark-core_2.11-2.3.1.jar >>> spark-graphx_2.11-2.3.1.jar >>> spark-hive-thriftserver_2.11-2.3.1.jar >>> spark-hive_2.11-2.3.1.jar >>> spark-kubernetes_2.11-2.3.1.jar >>> spark-kvstore_2.11-2.3.1.jar >>> spark-launcher_2.11-2.3.1.jar >>> spark-mesos_2.11-2.3.1.jar >>> spark-mllib-local_2.11-2.3.1.jar >>> spark-mllib_2.11-2.3.1.jar >>> spark-network-common_2.11-2.3.1.jar >>> spark-network-shuffle_2.11-2.3.1.jar >>> spark-repl_2.11-2.3.1.jar >>> spark-sketch_2.11-2.3.1.jar >>> spark-sql_2.11-2.3.1.jar >>> spark-streaming_2.11-2.3.1.jar >>> spark-tags_2.11-2.3.1.jar >>> spark-unsafe_2.11-2.3.1.jar >>> spark-yarn_2.11-2.3.1.jar >>> >>> I checked make-distribution.sh, and it copies a bunch of JARs into the >>> distro, but does not seem to touch the kafka modules. >>> >>> Am I crazy or missing something obvious -- those should be in the >>> release, right? >>> >> >