Hello, Hello, I am working with spark on dataiku using spots on S3 ( not in demand instances). I had no problem until I moved from park 3.3 to spark 3.4 ! I always have this fail and could not understand what configuration in the new version of spark lead to it
*org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=2, partition=357) failed; but task commit success, data duplication may happen.* Here is the full log [09:46:42] [INFO] [dku.utils] - [2024/07/17-09:46:42.680] [Thread-6] [ERROR] [org.apache.spark.sql.execution.datasources.FileFormatWriter] - Aborting job 2271622d-848a-4719-ad86-81d951235dbb. [09:46:42] [INFO] [dku.utils] - org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=2, partition=715) failed; but task commit success, data duplication may happen. reason=ExecutorLostFailure(1,false,Some(The executor with id 1 was deleted by a user or the framework.)) [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.17.jar:?] [09:46:42] [INFO] [dku.utils] - at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.17.jar:?] [09:46:42] [INFO] [dku.utils] - at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.17.jar:?] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1(DAGScheduler.scala:1199) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1$adapted(DAGScheduler.scala:1199) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.17.jar:?] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.handleStageFailed(DAGScheduler.scala:1199) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2981) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) ~[spark-core_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:307) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:271) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488) ~[spark-catalyst_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:789) ~[spark-sql_2.12-3.4.1.jar:3.4.1] [09:46:42] [INFO] [dku.utils] - at com.dataiku.dip.spark.ParquetWriter$.saveParquetDataset(ParquetWriter.scala:61) ~[dss-spark-main_2.12-5.1.jar:?] [09:46:42] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.saveHDFSableWithFastPathIfPossible(StdDataikuSparkContext.scala:898) ~[dss-spark-main_2.12-5.1.jar:?] [09:46:42] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.internalSave(StdDataikuSparkContext.scala:810) ~[dss-spark-main_2.12-5.1.jar:?] [09:46:42] [INFO] [dku.utils] - at com.dataiku.dss.spark.DataikuSparkContext.save(DataikuSparkContext.scala:83) ~[dss-spark-public_2.12-2.0.jar:?] [09:46:42] [INFO] [dku.utils] - at com.dataiku.dss.spark.DataikuSparkContext.save$(DataikuSparkContext.scala:75) ~[dss-spark-public_2.12-2.0.jar:?] [09:46:42] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.save(StdDataikuSparkContext.scala:58) ~[dss-spark-main_2.12-5.1.jar:?] [09:46:42] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.savePyDataFrame(StdDataikuSparkContext.scala:611) ~[dss-spark-main_2.12-5.1.jar:?] [09:46:42] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_412] [09:46:42] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_412] [09:46:42] [INFO] [dku.utils] - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_412] [09:46:42] [INFO] [dku.utils] - at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_412] [09:46:42] [INFO] [dku.utils] - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) ~[py4j-0.10.9.7.jar:?] [09:46:42] [INFO] [dku.utils] - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) ~[py4j-0.10.9.7.jar:?] [09:46:42] [INFO] [dku.utils] - at py4j.Gateway.invoke(Gateway.java:282) ~[py4j-0.10.9.7.jar:?] [09:46:42] [INFO] [dku.utils] - at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) ~[py4j-0.10.9.7.jar:?] [09:46:42] [INFO] [dku.utils] - at py4j.commands.CallCommand.execute(CallCommand.java:79) ~[py4j-0.10.9.7.jar:?] [09:46:42] [INFO] [dku.utils] - at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) ~[py4j-0.10.9.7.jar:?] [09:46:42] [INFO] [dku.utils] - at py4j.ClientServerConnection.run(ClientServerConnection.java:106) ~[py4j-0.10.9.7.jar:?] [09:46:42] [INFO] [dku.utils] - at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_412] ..... [09:46:46] [INFO] [dku.utils] - *************** Recipe code failed ************** [09:46:46] [INFO] [dku.utils] - Begin Python stack [09:46:46] [INFO] [dku.utils] - Traceback (most recent call last): [09:46:46] [INFO] [dku.utils] - [2024/07/17-09:46:46.581] [Thread-6] [INFO] [org.apache.spark.SparkContext] - SparkContext is stopping with exitCode 0. [09:46:46] [INFO] [dku.utils] - File "/data/dataiku/dss_data/jobs/PSTRAINFREUDAILY/Scenario_build_ps_3ans__NP___s_3ans__NP___dv_3ans__NP___c_3ans__NP___v_3ans__NP___t_3ans__NP__2024-07-17T09-40-06.948/compute_s_3ans_NP/sparkbased-recipe/outzQEdlS58jTxO/python-exec-wrapper.py", line 204, in <module> [09:46:46] [INFO] [dku.utils] - exec(f.read()) [09:46:46] [INFO] [dku.utils] - File "<string>", line 36, in <module> [09:46:46] [INFO] [dku.utils] - File "/opt/dataiku-dss-12.6.4/python/dataiku/spark/__init__.py", line 198, in write_with_schema [09:46:46] [INFO] [dku.utils] - write_dataframe(dataset, dataframe, delete_first) [09:46:46] [INFO] [dku.utils] - File "/opt/dataiku-dss-12.6.4/python/dataiku/spark/__init__.py", line 188, in write_dataframe [09:46:46] [INFO] [dku.utils] - dsc.savePyDataFrame(dataset.full_name, dataframe._jdf, dataset.writePartition, delete_first and "OVERWRITE" or "APPEND", False) [09:46:46] [INFO] [dku.utils] - File "/opt/dataiku-dss-12.6.4/spark-standalone-home/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ [09:46:46] [INFO] [dku.utils] - return_value = get_return_value( [09:46:46] [INFO] [dku.utils] - File "/opt/dataiku-dss-12.6.4/spark-standalone-home/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco [09:46:46] [INFO] [dku.utils] - return f(*a, **kw) [09:46:46] [INFO] [dku.utils] - File "/opt/dataiku-dss-12.6.4/spark-standalone-home/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value [09:46:46] [INFO] [dku.utils] - raise Py4JJavaError( [09:46:46] [DEBUG] [dku.spark.k8s] - Found savePyDataFrame failure in log: py4j.protocol.Py4JJavaError: An error occurred while calling o123.savePyDataFrame. [09:46:46] [INFO] [dku.utils] - py4j.protocol.Py4JJavaError: An error occurred while calling o123.savePyDataFrame. [09:46:46] [DEBUG] [dku.spark.k8s] - Checking line after the savePyDataFrame failure in log: : org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=2, partition=715) failed; but task commit success, data duplication may happen. reason=ExecutorLostFailure(1,false,Some(The executor with id 1 was deleted by a user or the framework.)) [09:46:46] [INFO] [dku.utils] - [2024/07/17-09:46:46.591] [Thread-6] [INFO] [org.sparkproject.jetty.server.AbstractConnector] - Stopped Spark@53a23618{HTTP/1.1, (http/1.1)}{0.0.0.0:4041} [09:46:46] [INFO] [dku.utils] - : org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=2, partition=715) failed; but task commit success, data duplication may happen. reason=ExecutorLostFailure(1,false,Some(The executor with id 1 was deleted by a user or the framework.)) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) [09:46:46] [INFO] [dku.utils] - [2024/07/17-09:46:46.595] [Thread-6] [INFO] [org.apache.spark.ui.SparkUI] - Stopped Spark web UI at http://ip-100-73-16-11.eu-west-3.compute.internal:4041 [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) [09:46:46] [INFO] [dku.utils] - at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) [09:46:46] [INFO] [dku.utils] - at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) [09:46:46] [INFO] [dku.utils] - at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1(DAGScheduler.scala:1199) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1$adapted(DAGScheduler.scala:1199) [09:46:46] [INFO] [dku.utils] - at scala.Option.foreach(Option.scala:407) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.handleStageFailed(DAGScheduler.scala:1199) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2981) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:307) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:271) [09:46:46] [INFO] [dku.utils] - [2024/07/17-09:46:46.599] [Thread-6] [INFO] [org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend] - Shutting down all executors [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190) [09:46:46] [INFO] [dku.utils] - [2024/07/17-09:46:46.600] [dispatcher-CoarseGrainedScheduler] [INFO] [org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint] - Asking each executor to shut down [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) [09:46:46] [INFO] [dku.utils] - [2024/07/17-09:46:46.604] [Thread-6] [WARN] [org.apache.spark.scheduler.cluster.k8s.ExecutorPodsWatchSnapshotSource] - Kubernetes client has been closed. [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) [09:46:46] [INFO] [dku.utils] - at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:789) [09:46:46] [INFO] [dku.utils] - at com.dataiku.dip.spark.ParquetWriter$.saveParquetDataset(ParquetWriter.scala:61) [09:46:46] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.saveHDFSableWithFastPathIfPossible(StdDataikuSparkContext.scala:898) [09:46:46] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.internalSave(StdDataikuSparkContext.scala:810) [09:46:46] [INFO] [dku.utils] - at com.dataiku.dss.spark.DataikuSparkContext.save(DataikuSparkContext.scala:83) [09:46:46] [INFO] [dku.utils] - at com.dataiku.dss.spark.DataikuSparkContext.save$(DataikuSparkContext.scala:75) [09:46:46] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.save(StdDataikuSparkContext.scala:58) [09:46:46] [INFO] [dku.utils] - at com.dataiku.dip.spark.StdDataikuSparkContext.savePyDataFrame(StdDataikuSparkContext.scala:611) [09:46:46] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [09:46:46] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [09:46:46] [INFO] [dku.utils] - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [09:46:46] [INFO] [dku.utils] - at java.lang.reflect.Method.invoke(Method.java:498) [09:46:46] [INFO] [dku.utils] - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) [09:46:46] [INFO] [dku.utils] - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) [09:46:46] [INFO] [dku.utils] - at py4j.Gateway.invoke(Gateway.java:282) [09:46:46] [INFO] [dku.utils] - at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) [09:46:46] [INFO] [dku.utils] - at py4j.commands.CallCommand.execute(CallCommand.java:79) [09:46:46] [INFO] [dku.utils] - at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) [09:46:46] [INFO] [dku.utils] - at py4j.ClientServerConnection.run(ClientServerConnection.java:106) [09:46:46] [INFO] [dku.utils] - at java.lang.Thread.run(Thread.java:750) I tried to change spark configurations to try to have the same behaviour as before upgrading Spark: spark.executor.memory = "10g" spark.sql.shuffle.partitions = 2000 spark.kubernetes.memoryOverheadFactor = 0.1 spark.port.maxRetries = 200 spark.kubernetes.executor.deleteOnTermination = True spark.kubernetes.allocation.batch.size = 5 spark.executor.cores = 2 spark.dynamicAllocation.maxExecutors = 32 spark.dynamicAllocation.enabled = False spark.shuffle.service.enabled = False spark.submit.deployMode = "client" spark.dynamicAllocation.minExecutors = 9 spark.dynamicAllocation.shuffleTracking.enabled = True spark.files.maxPartitionBytes = 512 spark.executor.instances = 21 spark.kubernetes.container.image.pullPolicy = "Always" spark.jars.repositories = "https://artifact.socrate.vsct.fr/artifactory/mavencentral-mvn-all-remote" spark.jars.packages = "org.apache.spark:spark-hadoop-cloud_2.12:3.2.2" spark.sql.parquet.int96RebaseModeInWrite = "CORRECTED" spark.sql.parquet.int96RebaseModeInRead = "CORRECTED" spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version = 2 spark.sql.parquet.enableVectorizedReader = False spark.sql.parquet.writeLegacyFormat = True spark.sql.parquet.binaryAsString = True spark.storage.decommission.enabled = True spark.storage.decommission.shuffleBlocks.enabled = True spark.storage.decommission.rddBlocks.enabled = True spark.decommission.enabled = True spark.jars.ivy = "/data/dataiku/.ivy2" spark.sql.sources.commitProtocolClass = "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol" spark.speculation = True spark.speculation.quantile = 0.90 spark.speculation.multiplier = 1.5 spark.task.maxFailures = 10 spark.sql.parquet.output.committer.class = "org.apache.parquet.hadoop.ParquetOutputCommitter" I need your help to solve this problem please. Thank you *GABOUJ Wafa* *DATA **Engineer* *Tél:+33612693579 *