I might have missed it but can you tell if the OOM is happening in driver or executor ? Also it would be good if you can post the actual exception.
On Tue 5 Jun, 2018, 1:55 PM Nicolas Paris, <nipari...@gmail.com> wrote: > IMO your json cannot be read in parallell at all then spark only offers > you > to play again with memory. > > I d'say at one step it has to feet in both one executor and in the driver. > I d'try something like 20GB for both driver and executors and by using > dynamic amount of executor in order to then repartition that fat json. > > > > > 2018-06-05 22:40 GMT+02:00 raksja <shanmugkr...@gmail.com>: > >> Yes I would say thats the first thing that i tried. thing is even though i >> provide more num executor and more memory to each, this process gets OOM >> in >> only one task which is stuck and unfinished. >> >> I dont think its splitting the load to other tasks. >> >> I had 11 blocks on that file i stored in hdfs and i got 11 partitions in >> my >> dataframe, when i did show(1), it spinned up 11 tasks, 10 passed quickly 1 >> stuck and oom. >> >> Also i repartitioned to 1000 and that didnt help either. >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >