allisonwang-db commented on code in PR #49535: URL: https://github.com/apache/spark/pull/49535#discussion_r1924245423
########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -3459,6 +3459,16 @@ object SQLConf { .checkValues(Set("legacy", "row", "dict")) .createWithDefaultString("legacy") + val PYSPARK_HIDE_TRACEBACK = + buildConf("spark.sql.execution.pyspark.udf.hideTraceback.enabled") + .doc( + "When true, only show the message of the exception from Python UDFs, " + + "hiding stack trace and exception type." + ) Review Comment: nit ```suggestion "hiding stack trace and exception type.") ``` ########## core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: ########## @@ -122,6 +122,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( protected val authSocketTimeout = conf.get(PYTHON_AUTH_SOCKET_TIMEOUT) private val reuseWorker = conf.get(PYTHON_WORKER_REUSE) protected val faultHandlerEnabled: Boolean = conf.get(PYTHON_WORKER_FAULTHANLDER_ENABLED) + protected val hideTraceback: Boolean = false Review Comment: Can we use `conf.get(PYSPARK_HIDE_TRACEBACK)` here so that we don't need to override every subclass? ########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -3459,6 +3459,16 @@ object SQLConf { .checkValues(Set("legacy", "row", "dict")) .createWithDefaultString("legacy") + val PYSPARK_HIDE_TRACEBACK = + buildConf("spark.sql.execution.pyspark.udf.hideTraceback.enabled") + .doc( + "When true, only show the message of the exception from Python UDFs, " + + "hiding stack trace and exception type." + ) Review Comment: I think we should only hide stack trace but not the Python error class (exception type)? ########## python/pyspark/util.py: ########## @@ -468,16 +468,19 @@ def handle_worker_exception(e: BaseException, outfile: IO) -> None: and exception traceback info to outfile. JVM could then read from the outfile and perform exception handling there. """ - try: - exc_info = None + + def format_exception() -> str: + if os.environ.get("SPARK_HIDE_TRACEBACK", False): + return "".join(traceback.format_exception_only(type(e), e)) Review Comment: Instead of checking this environment variable, can we add a parameter to this `handle_worker_exception` method? Something like `handle_worker_exception(..., hide_traceback)`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org