You should be able to split large job into more manageable jobs based
on stages using checkpoint. if a job fails, Job can be restarted from
the latest checkpoint, saving time and resources, thus xheckpoints can
be used as recovery points.
Smaller stages can be optimized independently, leading to better
resource utilization and faster execution.

regards,
Guru

On Tue, Nov 12, 2024 at 8:54 AM Ashwani Pundir
<ashwani.pund...@gmail.com> wrote:
>
> [TYPO CORRECTION]
> "I am manually un-persisting various RDDs progressively and that might have 
> something to do with this error. "
>
> On Tue, Nov 12, 2024 at 8:51 AM Ashwani Pundir <ashwani.pund...@gmail.com> 
> wrote:
>>
>> Hi Guru,
>>
>> Many thanks for taking out the time and looking into this issue. I really 
>> appreciate it.
>>
>> To your suggestion, I have already checked and It seems like this problem is 
>> not caused by the nulls in input data. The reason being, the issue is only 
>> happening when the 'job processing duration' is >= 10 months(for some reason 
>> which I am not able to figure out). When the job is broken down into smaller 
>> time duration for the same input data, it completes successfully.
>> To me, it seems like it has something to do with DISK_ONLY persistence that 
>> I am using in my job. I am manually persisting RDD progressively and that 
>> might have something to do with this error. But as per my understanding, 
>> even if the blocks are not available after the un-persists, RDD should 
>> recompute and not throw NullPointerException.
>>
>> Looking forward to some guidance on how I should proceed further.
>>
>> Regards,
>> Ashwani
>>
>>
>>
>> On Mon, Nov 11, 2024 at 7:23 PM Gurunandan <gurunandan....@gmail.com> wrote:
>>>
>>> Hi Ashwani,
>>> Please verify input data by ensuring that the data being processed is
>>> valid and free of null values or unexpected data types.
>>> if data undergoes complex transformations before sorting review the
>>> data Transformations, verify that data transformations don't introduce
>>> inconsistencies or null values.
>>>
>>> regards,
>>> Guru
>>>
>>> On Mon, Nov 11, 2024 at 6:06 PM Ashwani Pundir
>>> <ashwani.pund...@gmail.com> wrote:
>>> >
>>> > Dear Spark Devs,
>>> >
>>> > I am reaching out because I am struggling to find the root cause of the 
>>> > below exception
>>> > (I have spent almost 4 days trying to figure out the root cause of this 
>>> > issue. I tried to search on various forums including StackOverFlow but 
>>> > did not find anyone reported this kind of error)
>>> > -----------------------------------------------------------------
>>> > 2024-11-11 13:56:43,932  [Executor task launch worker for task 0.0 in 
>>> > stage 28.0 (TID 1007)] DEBUG - o.a.s.s.e.c. Compressor for [AggAllLevel]: 
>>> > org.apache.spark.sql.execution.columnar.compression.PassThrough$Encoder@766c1b92,
>>> >  ratio: 1.0
>>> > 2024-11-11 13:56:43,936  [Executor task launch worker for task 0.0 in 
>>> > stage 28.0 (TID 1007)] WARN  - o.a.s.s.BlockManager Putting block 
>>> > rdd_66_0 failed due to exception java.lang.NullPointerException: Cannot 
>>> > invoke 
>>> > "org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.numRecords()"
>>> >  because "this.inMemSorter" is null.
>>> > 2024-11-11 13:56:43,937  [Executor task launch worker for task 0.0 in 
>>> > stage 28.0 (TID 1007)] WARN  - o.a.s.s.BlockManager Block rdd_66_0 could 
>>> > not be removed as it was not found on disk or in memory
>>> > 2024-11-11 13:56:43,937  [dispatcher-BlockManagerMaster] DEBUG - 
>>> > o.a.s.s.BlockManagerMasterEndpoint Updating block info on master rdd_66_0 
>>> > for BlockManagerId(driver, ibpri011662.emeter.com, 17623, None)
>>> > 2024-11-11 13:56:43,937  [Executor task launch worker for task 0.0 in 
>>> > stage 28.0 (TID 1007)] DEBUG - o.a.s.s.BlockManagerMaster Updated info of 
>>> > block rdd_66_0
>>> > 2024-11-11 13:56:43,937  [Executor task launch worker for task 0.0 in 
>>> > stage 28.0 (TID 1007)] DEBUG - o.a.s.s.BlockManager Told master about 
>>> > block rdd_66_0
>>> > 2024-11-11 13:56:43,941  [Executor task launch worker for task 0.0 in 
>>> > stage 28.0 (TID 1007)] ERROR - o.a.s.e.Executor Exception in task 0.0 in 
>>> > stage 28.0 (TID 1007)
>>> > java.lang.NullPointerException: Cannot invoke 
>>> > "org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.numRecords()"
>>> >  because "this.inMemSorter" is null
>>> > at 
>>> > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:478)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:138)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.sort_addToSorter_0$(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(UDoubleColumnBuildernknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.smj_findNextJoinRows_0$(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:119)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.hasNext(InMemoryRelation.scala:288)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at scala.collection.Iterator$$anon$6.hasNext(Iterator.scala:477) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificColumnarIterator.hasNext(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_findNextJoinRows_0$(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:119)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.hasNext(InMemoryRelation.scala:288)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at scala.collection.Iterator$$anon$6.hasNext(Iterator.scala:477) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificColumnarIterator.hasNext(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_findNextJoinRows_0$(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>>> >  Source) ~[?:?]
>>> > at 
>>> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:119)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.hasNext(InMemoryRelation.scala:288)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at scala.collection.Iterator$$anon$6.hasNext(Iterator.scala:477) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificColumnarIterator.hasNext(Unknown
>>> >  Source) ~[?:?]
>>> > at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:139)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:177)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$6(BlockManager.scala:1635)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$6$adapted(BlockManager.scala:1633)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.storage.DiskStore.put(DiskStore.scala:88) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1633)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.scheduler.Task.run(Task.scala:141) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at 
>>> > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>>> >  ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
>>> > ~[em-spark-assembly.jar!/:?]
>>> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
>>> > [em-spark-assembly.jar!/:?]
>>> > at 
>>> > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>>> >  [?:?]
>>> > at 
>>> > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>>> >  [?:?]
>>> > at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
>>> > --------------------------------------------------------
>>> > Need your help in understanding when this issue can occur.
>>> > Please let me know if any further information will be needed from my 
>>> > side.(Attaching full logs for the reference)
>>> >
>>> > PS; Spark is running in local mode.
>>> >
>>> > Regards,
>>> > Ashwani
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to