problem using spark 3.4 with spots

wafa gabouj Thu, 18 Jul 2024 05:58:43 -0700

Hello,

Hello, I am working with spark on dataiku using spots on S3 ( not in demand
instances). I had no problem until I moved from park 3.3 to spark 3.4 ! I
always have this fail and could not understand what configuration in the
new version of spark lead to it


*org.apache.spark.SparkException: Job aborted due to stage failure:
Authorized committer (attemptNumber=0, stage=2, partition=357) failed; but
task commit success, data duplication may happen.*
Here is the full log

[09:46:42] [INFO] [dku.utils]  - [2024/07/17-09:46:42.680] [Thread-6]
[ERROR] [org.apache.spark.sql.execution.datasources.FileFormatWriter]
- Aborting job 2271622d-848a-4719-ad86-81d951235dbb.
[09:46:42] [INFO] [dku.utils]  - org.apache.spark.SparkException: Job
aborted due to stage failure: Authorized committer (attemptNumber=0,
stage=2, partition=715) failed; but task commit success, data
duplication may happen. reason=ExecutorLostFailure(1,false,Some(The
executor with id 1 was deleted by a user or the framework.))
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
~[scala-library-2.12.17.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
~[scala-library-2.12.17.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
~[scala-library-2.12.17.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1(DAGScheduler.scala:1199)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1$adapted(DAGScheduler.scala:1199)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.17.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.handleStageFailed(DAGScheduler.scala:1199)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2981)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
~[spark-core_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:307)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:271)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
~[spark-catalyst_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:789)
~[spark-sql_2.12-3.4.1.jar:3.4.1]
[09:46:42] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.ParquetWriter$.saveParquetDataset(ParquetWriter.scala:61)
~[dss-spark-main_2.12-5.1.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.saveHDFSableWithFastPathIfPossible(StdDataikuSparkContext.scala:898)
~[dss-spark-main_2.12-5.1.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.internalSave(StdDataikuSparkContext.scala:810)
~[dss-spark-main_2.12-5.1.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
com.dataiku.dss.spark.DataikuSparkContext.save(DataikuSparkContext.scala:83)
~[dss-spark-public_2.12-2.0.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
com.dataiku.dss.spark.DataikuSparkContext.save$(DataikuSparkContext.scala:75)
~[dss-spark-public_2.12-2.0.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.save(StdDataikuSparkContext.scala:58)
~[dss-spark-main_2.12-5.1.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.savePyDataFrame(StdDataikuSparkContext.scala:611)
~[dss-spark-main_2.12-5.1.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_412]
[09:46:42] [INFO] [dku.utils]  -    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_412]
[09:46:42] [INFO] [dku.utils]  -    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_412]
[09:46:42] [INFO] [dku.utils]  -    at
java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_412]
[09:46:42] [INFO] [dku.utils]  -    at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
~[py4j-0.10.9.7.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
~[py4j-0.10.9.7.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
py4j.Gateway.invoke(Gateway.java:282) ~[py4j-0.10.9.7.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
~[py4j-0.10.9.7.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
py4j.commands.CallCommand.execute(CallCommand.java:79)
~[py4j-0.10.9.7.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
~[py4j-0.10.9.7.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
~[py4j-0.10.9.7.jar:?]
[09:46:42] [INFO] [dku.utils]  -    at
java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_412]

.....

[09:46:46] [INFO] [dku.utils]  - *************** Recipe code failed
**************
[09:46:46] [INFO] [dku.utils]  - Begin Python stack
[09:46:46] [INFO] [dku.utils]  - Traceback (most recent call last):
[09:46:46] [INFO] [dku.utils]  - [2024/07/17-09:46:46.581] [Thread-6]
[INFO] [org.apache.spark.SparkContext]  - SparkContext is stopping
with exitCode 0.
[09:46:46] [INFO] [dku.utils]  -   File
"/data/dataiku/dss_data/jobs/PSTRAINFREUDAILY/Scenario_build_ps_3ans__NP___s_3ans__NP___dv_3ans__NP___c_3ans__NP___v_3ans__NP___t_3ans__NP__2024-07-17T09-40-06.948/compute_s_3ans_NP/sparkbased-recipe/outzQEdlS58jTxO/python-exec-wrapper.py",
line 204, in <module>
[09:46:46] [INFO] [dku.utils]  -     exec(f.read())
[09:46:46] [INFO] [dku.utils]  -   File "<string>", line 36, in <module>
[09:46:46] [INFO] [dku.utils]  -   File
"/opt/dataiku-dss-12.6.4/python/dataiku/spark/__init__.py", line 198,
in write_with_schema
[09:46:46] [INFO] [dku.utils]  -     write_dataframe(dataset,
dataframe, delete_first)
[09:46:46] [INFO] [dku.utils]  -   File
"/opt/dataiku-dss-12.6.4/python/dataiku/spark/__init__.py", line 188,
in write_dataframe
[09:46:46] [INFO] [dku.utils]  -
dsc.savePyDataFrame(dataset.full_name, dataframe._jdf,
dataset.writePartition, delete_first and "OVERWRITE" or "APPEND",
False)
[09:46:46] [INFO] [dku.utils]  -   File
"/opt/dataiku-dss-12.6.4/spark-standalone-home/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
line 1322, in __call__
[09:46:46] [INFO] [dku.utils]  -     return_value = get_return_value(
[09:46:46] [INFO] [dku.utils]  -   File
"/opt/dataiku-dss-12.6.4/spark-standalone-home/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
line 169, in deco
[09:46:46] [INFO] [dku.utils]  -     return f(*a, **kw)
[09:46:46] [INFO] [dku.utils]  -   File
"/opt/dataiku-dss-12.6.4/spark-standalone-home/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
line 326, in get_return_value
[09:46:46] [INFO] [dku.utils]  -     raise Py4JJavaError(
[09:46:46] [DEBUG] [dku.spark.k8s]  - Found savePyDataFrame failure in
log: py4j.protocol.Py4JJavaError: An error occurred while calling
o123.savePyDataFrame.
[09:46:46] [INFO] [dku.utils]  - py4j.protocol.Py4JJavaError: An error
occurred while calling o123.savePyDataFrame.
[09:46:46] [DEBUG] [dku.spark.k8s]  - Checking line after the
savePyDataFrame failure in log: : org.apache.spark.SparkException: Job
aborted due to stage failure: Authorized committer (attemptNumber=0,
stage=2, partition=715) failed; but task commit success, data
duplication may happen. reason=ExecutorLostFailure(1,false,Some(The
executor with id 1 was deleted by a user or the framework.))
[09:46:46] [INFO] [dku.utils]  - [2024/07/17-09:46:46.591] [Thread-6]
[INFO] [org.sparkproject.jetty.server.AbstractConnector]  - Stopped
Spark@53a23618{HTTP/1.1, (http/1.1)}{0.0.0.0:4041}
[09:46:46] [INFO] [dku.utils]  - : org.apache.spark.SparkException:
Job aborted due to stage failure: Authorized committer
(attemptNumber=0, stage=2, partition=715) failed; but task commit
success, data duplication may happen.
reason=ExecutorLostFailure(1,false,Some(The executor with id 1 was
deleted by a user or the framework.))
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
[09:46:46] [INFO] [dku.utils]  - [2024/07/17-09:46:46.595] [Thread-6]
[INFO] [org.apache.spark.ui.SparkUI]  - Stopped Spark web UI at
http://ip-100-73-16-11.eu-west-3.compute.internal:4041
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
[09:46:46] [INFO] [dku.utils]  -    at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[09:46:46] [INFO] [dku.utils]  -    at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[09:46:46] [INFO] [dku.utils]  -    at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1(DAGScheduler.scala:1199)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1$adapted(DAGScheduler.scala:1199)
[09:46:46] [INFO] [dku.utils]  -    at scala.Option.foreach(Option.scala:407)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.handleStageFailed(DAGScheduler.scala:1199)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2981)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:307)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:271)
[09:46:46] [INFO] [dku.utils]  - [2024/07/17-09:46:46.599] [Thread-6]
[INFO] 
[org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend]
 - Shutting down all executors
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190)
[09:46:46] [INFO] [dku.utils]  - [2024/07/17-09:46:46.600]
[dispatcher-CoarseGrainedScheduler] [INFO]
[org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint]
 - Asking each executor to shut down
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
[09:46:46] [INFO] [dku.utils]  - [2024/07/17-09:46:46.604] [Thread-6]
[WARN] [org.apache.spark.scheduler.cluster.k8s.ExecutorPodsWatchSnapshotSource]
 - Kubernetes client has been closed.
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
[09:46:46] [INFO] [dku.utils]  -    at
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:789)
[09:46:46] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.ParquetWriter$.saveParquetDataset(ParquetWriter.scala:61)
[09:46:46] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.saveHDFSableWithFastPathIfPossible(StdDataikuSparkContext.scala:898)
[09:46:46] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.internalSave(StdDataikuSparkContext.scala:810)
[09:46:46] [INFO] [dku.utils]  -    at
com.dataiku.dss.spark.DataikuSparkContext.save(DataikuSparkContext.scala:83)
[09:46:46] [INFO] [dku.utils]  -    at
com.dataiku.dss.spark.DataikuSparkContext.save$(DataikuSparkContext.scala:75)
[09:46:46] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.save(StdDataikuSparkContext.scala:58)
[09:46:46] [INFO] [dku.utils]  -    at
com.dataiku.dip.spark.StdDataikuSparkContext.savePyDataFrame(StdDataikuSparkContext.scala:611)
[09:46:46] [INFO] [dku.utils]  -    at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[09:46:46] [INFO] [dku.utils]  -    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[09:46:46] [INFO] [dku.utils]  -    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[09:46:46] [INFO] [dku.utils]  -    at
java.lang.reflect.Method.invoke(Method.java:498)
[09:46:46] [INFO] [dku.utils]  -    at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
[09:46:46] [INFO] [dku.utils]  -    at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[09:46:46] [INFO] [dku.utils]  -    at py4j.Gateway.invoke(Gateway.java:282)
[09:46:46] [INFO] [dku.utils]  -    at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
[09:46:46] [INFO] [dku.utils]  -    at
py4j.commands.CallCommand.execute(CallCommand.java:79)
[09:46:46] [INFO] [dku.utils]  -    at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[09:46:46] [INFO] [dku.utils]  -    at
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[09:46:46] [INFO] [dku.utils]  -    at java.lang.Thread.run(Thread.java:750)


I tried to change spark configurations to try to have the same
behaviour as before upgrading Spark:

spark.executor.memory = "10g"
spark.sql.shuffle.partitions = 2000
spark.kubernetes.memoryOverheadFactor = 0.1
spark.port.maxRetries = 200
spark.kubernetes.executor.deleteOnTermination = True
spark.kubernetes.allocation.batch.size = 5
spark.executor.cores = 2
spark.dynamicAllocation.maxExecutors = 32
spark.dynamicAllocation.enabled = False
spark.shuffle.service.enabled = False
spark.submit.deployMode = "client"
spark.dynamicAllocation.minExecutors = 9
spark.dynamicAllocation.shuffleTracking.enabled = True
spark.files.maxPartitionBytes = 512 spark.executor.instances = 21
spark.kubernetes.container.image.pullPolicy = "Always"
spark.jars.repositories =
"https://artifact.socrate.vsct.fr/artifactory/mavencentral-mvn-all-remote";
spark.jars.packages = "org.apache.spark:spark-hadoop-cloud_2.12:3.2.2"
spark.sql.parquet.int96RebaseModeInWrite = "CORRECTED"
spark.sql.parquet.int96RebaseModeInRead = "CORRECTED"
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version = 2
spark.sql.parquet.enableVectorizedReader = False
spark.sql.parquet.writeLegacyFormat = True
spark.sql.parquet.binaryAsString = True
spark.storage.decommission.enabled = True
spark.storage.decommission.shuffleBlocks.enabled = True
spark.storage.decommission.rddBlocks.enabled = True
spark.decommission.enabled = True
spark.jars.ivy = "/data/dataiku/.ivy2"
spark.sql.sources.commitProtocolClass =
"org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol"
spark.speculation = True spark.speculation.quantile = 0.90

spark.speculation.multiplier = 1.5
spark.task.maxFailures = 10
spark.sql.parquet.output.committer.class =
"org.apache.parquet.hadoop.ParquetOutputCommitter"


I need your help to solve this problem please.
Thank you

*GABOUJ Wafa*
*DATA **Engineer*
*Tél:+33612693579 *

problem using spark 3.4 with spots

Reply via email to