Hi,

It's been a while since I worked with Spark Standalone, but I'd check the
logs of the workers. How do you spark-submit the app?

DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd.hri...@gmail.com>
wrote:

> Thanks Jacek for quick response.
> Due to our system constraints, we can't move to Structured Streaming now.
> But definitely YARN can be tried out.
>
> But my problem is I'm able to figure out where is the issue, Driver,
> Executor, or Worker. Even exceptions are clueless.  Please see the below
> exception, I'm unable to spot the issue for OOM.
>
> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
> driver-20200508153502-1291
>
> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>
> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>
> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>
> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
> app-20200508153654-11776 removed, cleanupLocalDirs = true
>
> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
> killed by user
>
> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
> stacktrace] was thrown by a user handler's exceptionCaught() method while
> handling the following exception:*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[dispatcher-event-loop-6,5,main]*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
> stacktrace] was thrown by a user handler's exceptionCaught() method while
> handling the following exception:*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>
> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>
>
>
>
> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>> use Spark Structured Streaming at the very least and/or use YARN as the
>> cluster manager".
>>
>> Another thought was that the user code (your code) could be leaking
>> resources so Spark eventually reports heap-related errors that may not
>> necessarily be Spark's.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I am getting out of memory error in worker log in streaming jobs in
>>> every couple of hours. After this worker dies. There is no shuffle, no
>>> aggression, no. caching  in job, its just a transformation.
>>> I'm not able to identify where is the problem, driver or executor. And
>>> why worker getting dead after the OOM streaming job should die. Am I
>>> missing something.
>>>
>>> Driver Memory:  2g
>>> Executor memory: 4g
>>>
>>> Spark Version:  2.4
>>> Kafka Direct Stream
>>> Spark Standalone Cluster.
>>>
>>>
>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>>> with view permissions: Set(); users  with modify permissions: Set(root);
>>> groups with modify permissions: Set()
>>>
>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>> exception in thread Thread[ExecutorRunner for
>>> app-20200506124717-10226/0,5,main]
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>
>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>
>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>> Source)
>>>
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>
>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>
>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>
>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>
>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>
>>> at
>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>
>>> at
>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>
>>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>
>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>
>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>
>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>
>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>
>>> at
>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>
>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>> driver driver-20200505181719-1187
>>>
>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>
>>>
>>>
>>>
>>> Regards
>>> Hrishi
>>>
>>

Reply via email to