Hello, I am facing in my Project two different issues with Spark that are driving me crazy. I am currently running in EMR (Spark 1.5.2 + YARN), using the "--executor-memory 40G" option.
Problem #1 ========= Some of my processes get killed by YARN because the container is exceeding the physical memory YARN assigned it. I have been able to work around this issue by increasing the spark.yarn.executor.memoryOverhead parameter to 8G, but that doesn't seem like a good solution. My understanding is that the JVM that will run my Spark process will get 40 GB of heap memory (-Xmx40G), and if there is memory pressure in the process then the GC should kick in to ensure that the heap never exceeds those 40 GB. My PermGen is set to 510MB, but that is a very long way from the 8GB I need to set as overhead. This seems to happen when I .cache() very big RDDs and I then perform operations that require shuffling (cogroup & co.). - Who is using all that off heap memory? - Are there any tools in the Spark ecosystem that might help me debug this? Problem #2 ========= Some tasks fail because the heartbeat didn't get back to the master in 120 seconds. Again, I can more or less work around this by increasing the timeout to 5 minutes, but I don't feel this is addressing the real problem. - Does the heartbeat have its own thread or would a long-running .map() block the heartbeat? - What conditions would prevent the heartbeat from being sent? Many thanks in advance for any help with this, Ximo. ________________________________ Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede contener información privilegiada o confidencial y es para uso exclusivo de la persona o entidad de destino. Si no es usted. el destinatario indicado, queda notificado de que la lectura, utilización, divulgación y/o copia sin autorización puede estar prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente por esta misma vía y proceda a su destrucción. The information contained in this transmission is privileged and confidential information intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, do not read it. Please immediately reply to the sender that you have received this communication in error and then delete it. Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e proceda a sua destruição --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org