*Shuporno Choudhury <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=32465&i=2>>
> *Cc: *"Jörn Franke [via Apache Spark User List]" <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=32465&i=3>>, <[hidden email]
&
t;Jörn Franke [via Apache Spark User List]"
> ,
> Subject: Re: [PySpark] Releasing memory after a spark job is finished
>
> Can you tell us what version of Spark you are using and if Dynamic Allocation
> is enabled ?
>
> Also, how are the files being read ? Is it a
"Jörn Franke [via Apache Spark User List]"
,
Subject: Re: [PySpark] Releasing memory after a spark job is finished
Can you tell us what version of Spark you are using and if Dynamic Allocation
is enabled ?
Also, how are the files being read ? Is it a single read of all files using a
file
Can you tell us what version of Spark you are using and if Dynamic
Allocation is enabled ?
Also, how are the files being read ? Is it a single read of all files using
a file matching regex or are you running different threads in the same
pyspark job?
On Mon 4 Jun, 2018, 1:27 PM Shuporno Choudhu
Thanks a lot for the insight.
Actually I have the exact same transformations for all the datasets, hence
only 1 python code.
Now, do you suggest that I run different spark-submit for all the different
datasets given that I have the exact same transformations?
On Tue 5 Jun, 2018, 1:48 AM Jörn Frank
Yes if they are independent with different transformations then I would create
a separate python program. Especially for big data processing frameworks one
should avoid to put everything in one big monotholic applications.
> On 4. Jun 2018, at 22:02, Shuporno Choudhury
> wrote:
>
> Hi,
>
>
Hi,
Thanks for the input.
I was trying to get the functionality first, hence I was using local mode.
I will be running on a cluster definitely but later.
Sorry for my naivety, but can you please elaborate on the modularity
concept that you mentioned and how it will affect whatever I am already
do
Why don’t you modularize your code and write for each process an independent
python program that is submitted via Spark?
Not sure though if Spark local make sense. If you don’t have a cluster then a
normal python program can be much better.
> On 4. Jun 2018, at 21:37, Shuporno Choudhury
> wro