Re: PySpark, Structured Streaming and Kafka

2017-08-24 Thread Brian Wylie
Resolved :) Hi just a loopback on this (thanks for everyone's help). In jupyter notebook the following command works and properly loads in the Kafka jar files. # Spin up a local Spark Session spark = SparkSession.builder.appName('my_awesome')\ .config('spark.jars.packages', 'org.apache.s

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Brian Wylie
Shixiong, Your suggestion works if I use the pyspark-shell directly. In this case I want to setup a Spark Session from within my Jupyter Notebook. My question/issue is related to this SO question: https://stackoverflow.com/questions/35762459/add-jar-to-standalone-pyspark so basically I want to a

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Shixiong(Ryan) Zhu
You can use `bin/pyspark --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0` to start "pyspark". If you want to use "spark-submit", you also need to provide your Python file. On Wed, Aug 23, 2017 at 1:41 PM, Brian Wylie wrote: > Hi All, > > I'm trying the new hotness of using Kafka and

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Riccardo Ferrari
Hi Brian, Very nice work you have done! WRT you issue: Can you clarify how are you adding the kafka dependency when using Jupyter? The ClassNotFoundException really tells you about the missing dependency. A bit different is the IllegalArgumentException error, that is simply because you are not t