Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Shuporno Choudhury
*Shuporno Choudhury <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=32465&i=2>> > *Cc: *"Jörn Franke [via Apache Spark User List]" <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=32465&i=3>>, <[hidden email] &

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jörn Franke
ear out memory, what can be a better approach for > this problem? > > Can someone please help me with this and tell me if I am going wrong anywhere? > > --Thanks, > Shuporno Choudhury > > > If you reply to this email, your message will be added to the discussion > b

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Thakrar, Jayesh
"Jörn Franke [via Apache Spark User List]" , Subject: Re: [PySpark] Releasing memory after a spark job is finished Can you tell us what version of Spark you are using and if Dynamic Allocation is enabled ? Also, how are the files being read ? Is it a single read of all files using a file

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jay
;> To unsubscribe from Apache Spark User List, click here. >>> NAML >>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Shuporno Choudhury
ing local[*] as master. There is a single >> SparkSession that is doing all the processing. >> If it is not possible to clear out memory, what can be a better approach >> for this problem? >> >> Can someone please help me with this and tell me if I am going wrong >> anywhere? >> >> --Th

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jörn Franke
using local[*] as master. There is a single SparkSession >>> that is doing all the processing. >>> If it is not possible to clear out memory, what can be a better approach >>> for this problem? >>> >>> Can someone please help me with this and tell me if I am goi

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Shuporno Choudhury
t can be a better approach > for this problem? > > Can someone please help me with this and tell me if I am going wrong > anywhere? > > --Thanks, > Shuporno Choudhury > > > > ------ > If you reply to this email, your message will be added to the discus

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jörn Franke
Why don’t you modularize your code and write for each process an independent python program that is submitted via Spark? Not sure though if Spark local make sense. If you don’t have a cluster then a normal python program can be much better. > On 4. Jun 2018, at 21:37, Shuporno Choudhury > wro

[PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Shuporno Choudhury
Hi everyone, I am trying to run a pyspark code on some data sets sequentially [basically 1. Read data into a dataframe 2.Perform some join/filter/aggregation 3. Write modified data in parquet format to a target location] Now, while running this pyspark code across *multiple independent data sets se