How much memory does the "do some stuff" portions occupy ? You should try
caching the RDD and take a look at the Spark UI under the Storage tab to see
how much memory is being used. Also, what portion of the overall memory of each
worker are you allocating when you cal spark-submit ?
Sent from
I suspect OOO happens in executor side, you have to check the stacktrace by
yourself if you can not attach more info. Most likely it is due to your
user code.
Alexander Czech 于2017年9月21日周四 下午5:54写道:
> That is not really possible the whole project is rather large and I would
> not like to release
That is not really possible the whole project is rather large and I would
not like to release it before I published the results.
But if there is no know issues with doing spark in a for loop I will look
into other possibilities for memory leaks.
Thanks
On 20 Sep 2017 15:22, "Weichen Xu" wrote:
Spark manage memory allocation and release automatically. Can you post the
complete program which help checking where is wrong ?
On Wed, Sep 20, 2017 at 8:12 PM, Alexander Czech <
alexander.cz...@googlemail.com> wrote:
> Hello all,
>
> I'm running a pyspark script that makes use of for loop to cr
Hello all,
I'm running a pyspark script that makes use of for loop to create smaller
chunks of my main dataset.
some example code:
for chunk in chunks:
my_rdd = sc.parallelize(chunk).flatmap(somefunc)
# do some stuff with my_rdd
my_df = make_df(my_rdd)
# do some stuff with my_df