Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Shuporno Choudhury
*Shuporno Choudhury <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=32465&i=2>> > *Cc: *"Jörn Franke [via Apache Spark User List]" <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=32465&i=3>>, <[hidden email] &

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jörn Franke
t;Jörn Franke [via Apache Spark User List]" > , > Subject: Re: [PySpark] Releasing memory after a spark job is finished > > Can you tell us what version of Spark you are using and if Dynamic Allocation > is enabled ? > > Also, how are the files being read ? Is it a

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Thakrar, Jayesh
"Jörn Franke [via Apache Spark User List]" , Subject: Re: [PySpark] Releasing memory after a spark job is finished Can you tell us what version of Spark you are using and if Dynamic Allocation is enabled ? Also, how are the files being read ? Is it a single read of all files using a file

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jay
Can you tell us what version of Spark you are using and if Dynamic Allocation is enabled ? Also, how are the files being read ? Is it a single read of all files using a file matching regex or are you running different threads in the same pyspark job? On Mon 4 Jun, 2018, 1:27 PM Shuporno Choudhu

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Shuporno Choudhury
Thanks a lot for the insight. Actually I have the exact same transformations for all the datasets, hence only 1 python code. Now, do you suggest that I run different spark-submit for all the different datasets given that I have the exact same transformations? On Tue 5 Jun, 2018, 1:48 AM Jörn Frank

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jörn Franke
Yes if they are independent with different transformations then I would create a separate python program. Especially for big data processing frameworks one should avoid to put everything in one big monotholic applications. > On 4. Jun 2018, at 22:02, Shuporno Choudhury > wrote: > > Hi, > >

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Shuporno Choudhury
Hi, Thanks for the input. I was trying to get the functionality first, hence I was using local mode. I will be running on a cluster definitely but later. Sorry for my naivety, but can you please elaborate on the modularity concept that you mentioned and how it will affect whatever I am already do

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Jörn Franke
Why don’t you modularize your code and write for each process an independent python program that is submitted via Spark? Not sure though if Spark local make sense. If you don’t have a cluster then a normal python program can be much better. > On 4. Jun 2018, at 21:37, Shuporno Choudhury > wro