I think most spark technical support people would really recommend upgrading to spark 2.0+ for starters. However, I understand that's not always possible. In this case I would double check to make sure that you don't have a situation where you have a join key that has many records associated with in in one or both datasets. This would cause all of those records to get pushed into a single partition and then can cause your process to oom when you go to process that partition in the next phase.
On Wed, Jan 9, 2019 at 4:24 PM William Shen <wills...@marinsoftware.com> wrote: > Thank you for the tips. We are running Spark 1.6 (scala), and OOM happens > with SparkSQL trying to join a few large dataset together for > processing/transformation... > > On Wed, Jan 9, 2019 at 3:42 PM Ramandeep Singh <rs5...@nyu.edu> wrote: > >> Hi, >> >> Here are a few suggestions that you can try. >> >> OOM Issues that, I have faced with Spark: >> *Not enough shuffle partition*s.Increase them. >> Less memory Overhead settings: Boosting it to around 12 percent. You >> usually get this as a error message in your executors. >> *Large Executor Configs*: They can be problematic, smaller and larger in >> number executors are preferred over larger and fewer executors. >> Changing GC algorithm >> >> http://orastack.com/spark-scaling-to-large-datasets.html >> >> >> Here are a few tips >> >> >> >> >> On Wed, Jan 9, 2019 at 1:55 PM Dillon Dukek >> <dillon.du...@placed.com.invalid> wrote: >> >>> Hi William, >>> >>> Just to get started, can you describe the spark version you are using >>> and the language? It doesn't sound like you are using pyspark, however, >>> problems arising from that can be different so I just want to be sure. As >>> well, can you talk through the scenario under which you are dealing with >>> this error? ie the order of operations for the transformations you are >>> applying. >>> >>> However, if you're set on getting a heap dump, probably the easiest way >>> would be to just monitor an active application through the spark UI then go >>> grab a heap dump from the executor java process when you notice one that's >>> having problems. >>> >>> On Wed, Jan 9, 2019 at 10:18 AM William Shen <wills...@marinsoftware.com> >>> wrote: >>> >>>> Hi there, >>>> >>>> We've encountered Spark executor Java OOM issues for our Spark >>>> application. Any tips on how to troubleshoot to identify what objects are >>>> occupying the heap? In the past, dealing with JVM OOM, we've worked with >>>> analyzing heap dumps, but we are having a hard time with locating Spark >>>> heap dump after a crash, and we also anticipate that these heap dump will >>>> be huge (since our nodes have a large memory allocation) and may be >>>> difficult to analyze locally. Can someone share their experience dealing >>>> with Spark OOM? >>>> >>>> Thanks! >>>> >>> >> >> -- >> Regards, >> Ramandeep Singh >> Blog:http://ramannanda.blogspot.com >> >