*Shuporno Choudhury <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=32465&i=2>>
> *Cc: *"Jörn Franke [via Apache Spark User List]" <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=32465&i=3>>, <[hidden email]
&
ear out memory, what can be a better approach for
> this problem?
>
> Can someone please help me with this and tell me if I am going wrong anywhere?
>
> --Thanks,
> Shuporno Choudhury
>
>
> If you reply to this email, your message will be added to the discussion
> b
"Jörn Franke [via Apache Spark User List]"
,
Subject: Re: [PySpark] Releasing memory after a spark job is finished
Can you tell us what version of Spark you are using and if Dynamic Allocation
is enabled ?
Also, how are the files being read ? Is it a single read of all files using a
file
;> To unsubscribe from Apache Spark User List, click here.
>>> NAML
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view
ing local[*] as master. There is a single
>> SparkSession that is doing all the processing.
>> If it is not possible to clear out memory, what can be a better approach
>> for this problem?
>>
>> Can someone please help me with this and tell me if I am going wrong
>> anywhere?
>>
>> --Th
using local[*] as master. There is a single SparkSession
>>> that is doing all the processing.
>>> If it is not possible to clear out memory, what can be a better approach
>>> for this problem?
>>>
>>> Can someone please help me with this and tell me if I am goi
t can be a better approach
> for this problem?
>
> Can someone please help me with this and tell me if I am going wrong
> anywhere?
>
> --Thanks,
> Shuporno Choudhury
>
>
>
> ------
> If you reply to this email, your message will be added to the discus
Why don’t you modularize your code and write for each process an independent
python program that is submitted via Spark?
Not sure though if Spark local make sense. If you don’t have a cluster then a
normal python program can be much better.
> On 4. Jun 2018, at 21:37, Shuporno Choudhury
> wro
Hi everyone,
I am trying to run a pyspark code on some data sets sequentially [basically
1. Read data into a dataframe 2.Perform some join/filter/aggregation 3.
Write modified data in parquet format to a target location]
Now, while running this pyspark code across *multiple independent data sets
se