Hi, It's been a while since I worked with Spark Standalone, but I'd check the logs of the workers. How do you spark-submit the app?
DId you check /grid/1/spark/work/driver-20200508153502-1291 directory? Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd.hri...@gmail.com> wrote: > Thanks Jacek for quick response. > Due to our system constraints, we can't move to Structured Streaming now. > But definitely YARN can be tried out. > > But my problem is I'm able to figure out where is the issue, Driver, > Executor, or Worker. Even exceptions are clueless. Please see the below > exception, I'm unable to spot the issue for OOM. > > 20/05/08 15:36:55 INFO Worker: Asked to kill driver > driver-20200508153502-1291 > > 20/05/08 15:36:55 INFO DriverRunner: Killing driver process! > > 20/05/08 15:36:55 INFO CommandUtils: Redirection to > /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed > > 20/05/08 15:36:55 INFO CommandUtils: Redirection to > /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed > > 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application > app-20200508153654-11776 removed, cleanupLocalDirs = true > > 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was > killed by user > > *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception > 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full > stacktrace] was thrown by a user handler's exceptionCaught() method while > handling the following exception:* > > *java.lang.OutOfMemoryError: Java heap space* > > *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception > in thread Thread[dispatcher-event-loop-6,5,main]* > > *java.lang.OutOfMemoryError: Java heap space* > > *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception > 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full > stacktrace] was thrown by a user handler's exceptionCaught() method while > handling the following exception:* > > *java.lang.OutOfMemoryError: Java heap space* > > 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! > > 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! > > 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! > > 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called > > 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory > /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922 > > > > > On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >> Sorry for being perhaps too harsh, but when you asked "Am I missing >> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone >> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to >> use Spark Structured Streaming at the very least and/or use YARN as the >> cluster manager". >> >> Another thought was that the user code (your code) could be leaking >> resources so Spark eventually reports heap-related errors that may not >> necessarily be Spark's. >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://about.me/JacekLaskowski >> "The Internals Of" Online Books <https://books.japila.pl/> >> Follow me on https://twitter.com/jaceklaskowski >> >> <https://twitter.com/jaceklaskowski> >> >> >> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com> >> wrote: >> >>> Hi >>> >>> I am getting out of memory error in worker log in streaming jobs in >>> every couple of hours. After this worker dies. There is no shuffle, no >>> aggression, no. caching in job, its just a transformation. >>> I'm not able to identify where is the problem, driver or executor. And >>> why worker getting dead after the OOM streaming job should die. Am I >>> missing something. >>> >>> Driver Memory: 2g >>> Executor memory: 4g >>> >>> Spark Version: 2.4 >>> Kafka Direct Stream >>> Spark Standalone Cluster. >>> >>> >>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication >>> disabled; ui acls disabled; users with view permissions: Set(root); groups >>> with view permissions: Set(); users with modify permissions: Set(root); >>> groups with modify permissions: Set() >>> >>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught >>> exception in thread Thread[ExecutorRunner for >>> app-20200506124717-10226/0,5,main] >>> >>> java.lang.OutOfMemoryError: Java heap space >>> >>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source) >>> >>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source) >>> >>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source) >>> >>> at >>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown >>> Source) >>> >>> at >>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown >>> Source) >>> >>> at >>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown >>> Source) >>> >>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) >>> >>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) >>> >>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) >>> >>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) >>> >>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) >>> >>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) >>> >>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480) >>> >>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468) >>> >>> at >>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539) >>> >>> at >>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492) >>> >>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405) >>> >>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) >>> >>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) >>> >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464) >>> >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436) >>> >>> at >>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114) >>> >>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114) >>> >>> at org.apache.spark.deploy.worker.ExecutorRunner.org >>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149) >>> >>> at >>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73) >>> >>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing >>> driver driver-20200505181719-1187 >>> >>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process! >>> >>> >>> >>> >>> Regards >>> Hrishi >>> >>