[ https://issues.apache.org/jira/browse/SPARK-49870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021497#comment-18021497 ]
Max Payson commented on SPARK-49870: ------------------------------------ Hi [~gurwls223] [~dongjoon] , we are trying to upgrade to Python 3.13 and PySpark is crashing locally on Windows. We think this may be specific to the `createDataFrame` API, other operations seem to work. Please let me know if I should create a separate issue or if we can help investigate further, it seems low level. Testing notes: * Reproducible with PySpark 4.0.1, 4.0.0, and 3.5.5 * Reproducible with Python 3.12 and 3.13 (the last working version is 3.11.13) * Reproducible on all Windows machines we've tested * Using Java 17 * Using Spark Classic locally (not Spark Connect) * Installed PySpark with pip, Python & PySpark are the only installed dependencies Reproduction: {code:java} import os import sys from pyspark.sql import SparkSession os.environ["PYSPARK_PYTHON"] = sys.executable spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame([(1,), (2,)], ["myint"]) df.show() {code} Stack trace. Note this is with "spark.python.worker.faulthandler.enabled" enabled, but the stack trace is the same with it disabled: {code:java} Traceback (most recent call last): File "<your_script>.py", line 10, in <module> df.show() File "<pyspark>/sql/classic/dataframe.py", line 285, in show print(self._show_string(n, truncate, vertical)) File "<pyspark>/sql/classic/dataframe.py", line 303, in _show_string return self._jdf.showString(n, 20, vertical) File "<py4j>/java_gateway.py", line 1362, in __call__ return_value = get_return_value( answer, self.gateway_client, self.target_id, self.name) File "<pyspark>/errors/exceptions/captured.py", line 282, in deco return f(*a, **kw) File "<py4j>/protocol.py", line 327, in get_return_value raise Py4JJavaError( "An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o48.showString.: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:624) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:599) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:945) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:925) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:532) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:601) at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:402) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:901) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:901) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374) at org.apache.spark.rdd.RDD.iterator(RDD.scala:338) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:147) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840)Caused by: java.io.EOFException at java.base/java.io.DataInputStream.readInt(DataInputStream.java:386) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:933) ... 26 more {code} > Support Python 3.13 in Spark Classic > ------------------------------------ > > Key: SPARK-49870 > URL: https://issues.apache.org/jira/browse/SPARK-49870 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark > Affects Versions: 4.0.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Major > Fix For: 4.0.0 > > > Basic tests pass with Python 3.13 for Spark Classic > (https://github.com/apache/spark/actions/runs/11168860784) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org