xinrong-meng commented on code in PR #51006: URL: https://github.com/apache/spark/pull/51006#discussion_r2110394870
########## python/pyspark/pandas/strings.py: ########## @@ -2031,7 +2031,13 @@ def pudf(s: pd.Series) -> pd.Series: if expand: psdf = psser.to_frame() scol = psdf._internal.data_spark_columns[0] - spark_columns = [scol[i].alias(str(i)) for i in range(n + 1)] + + if ps.get_option("compute.ansi_mode_support"): + spark_columns = [ + F.try_element_at(scol, F.lit(i + 1)).alias(str(i)) for i in range(n + 1) Review Comment: Thanks for suggestion! There might not be a significant perf difference between creating F.lit inside the loop or beforehand, it's just wrapping a Python literal into a Spark expression, which aren’t executed immediately(just nodes in the DAG), and will be deduplicated by Catalyst. With that being said I’d like to keep the original for simplicity, but feel free to share if you have other opinions! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org