Checkpoints not cleaned using Spark streaming + watermarking + kafka

2017-09-21 Thread MathieuP
Hi Spark Users ! :) I come to you with a question about checkpoints. I have a streaming application that consumes and produces to Kafka. The computation requires a window and watermarking. Since this is a streaming application with a Kafka output, a checkpoint is expected. The application runs u

Re: How to pass sparkSession from driver to executor

2017-09-21 Thread ayan guha
The point here is - spark session is not available in executors. So, you have to use appropriate storage clients. On Fri, Sep 22, 2017 at 1:44 AM, lucas.g...@gmail.com wrote: > I'm not sure what you're doing. But I have in the past used spark to > consume a manifest file and then execute a .map

Re: How to pass sparkSession from driver to executor

2017-09-21 Thread lucas.g...@gmail.com
I'm not sure what you're doing. But I have in the past used spark to consume a manifest file and then execute a .mapPartition on the result like this: def map_key_to_event(s3_events_data_lake): def _map_key_to_event(event_key_list, s3_client=test_stub): print("Events in list")

Re: How to pass sparkSession from driver to executor

2017-09-21 Thread Weichen Xu
Spark do not allow executor code using `sparkSession`. But I think you can move all json files to one directory, and them run: ``` spark.read.json("/path/to/jsonFileDir") ``` But if you want to get filename at the same time, you can use ``` spark.sparkContext.wholeTextFiles("/path/to/jsonFileDir")

Re: How to pass sparkSession from driver to executor

2017-09-21 Thread Riccardo Ferrari
Depends on your use-case however broadcasting could be a better option. On Thu, Sep 21, 2017 at 2:03 PM, Chackravarthy Esakkimuthu < chaku.mi...@gmail.com> wrote: > Hi, > > I want to know how to pass sparkSession

Re: for loops in pyspark

2017-09-21 Thread Femi Anthony
How much memory does the "do some stuff" portions occupy ? You should try caching the RDD and take a look at the Spark UI under the Storage tab to see how much memory is being used. Also, what portion of the overall memory of each worker are you allocating when you cal spark-submit ? Sent from

How to know what are possible operations spark raw sql can support?

2017-09-21 Thread kant kodali
How to know what are all possible operations spark raw sql can support? Is there any document ? Thanks!

How to pass sparkSession from driver to executor

2017-09-21 Thread Chackravarthy Esakkimuthu
Hi, I want to know how to pass sparkSession from driver to executor. I have a spark program (batch job) which does following, # val spark = SparkSession.builder().appName("SampleJob").config( "spark.master", "local") .getOrCreate() val df = this is dataframe which has list of f

Re: for loops in pyspark

2017-09-21 Thread Jeff Zhang
I suspect OOO happens in executor side, you have to check the stacktrace by yourself if you can not attach more info. Most likely it is due to your user code. Alexander Czech 于2017年9月21日周四 下午5:54写道: > That is not really possible the whole project is rather large and I would > not like to release

Re: Is there a SparkILoop for Java?

2017-09-21 Thread Jeff Zhang
You may try zeppelin which provide rest api to execute code https://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/usage/rest_api/notebook.html#create-a-new-note kant kodali 于2017年9月21日周四 上午4:09写道: > Is there an API like SparkILoop >

Re: for loops in pyspark

2017-09-21 Thread Alexander Czech
That is not really possible the whole project is rather large and I would not like to release it before I published the results. But if there is no know issues with doing spark in a for loop I will look into other possibilities for memory leaks. Thanks On 20 Sep 2017 15:22, "Weichen Xu" wrote: