zhengruifeng commented on code in PR #49338: URL: https://github.com/apache/spark/pull/49338#discussion_r1899981300
########## python/pyspark/sql/functions/builtin.py: ########## @@ -15341,12 +15341,15 @@ def regexp_count(str: "ColumnOrName", regexp: "ColumnOrName") -> Column: Examples -------- + >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"]) - >>> df.select(regexp_count('str', lit(r'\d+')).alias('d')).collect() + >>> df.select(sf.regexp_count('str', lit(r'\d+')).alias('d')).collect() Review Comment: let's make such changes: 1, replace `collect` with `show()`, and discard `alias('d')` to also test the default output column name; 2, also `show` the input columns by adding `'*'`; 3, also add `sf.` for other built-in PySpark functions, e.g. `lit`; ```suggestion >>> df.select('*', sf.regexp_count('str', sf.lit(r'\d+'))).show() ``` please also make such changes in other places. ########## python/pyspark/sql/functions/builtin.py: ########## @@ -15406,9 +15414,9 @@ def regexp_extract_all( Parameters ---------- - str : :class:`~pyspark.sql.Column` or str + str : :class:`~pyspark.sql.Column` or column name target column to work on. - regexp : :class:`~pyspark.sql.Column` or str + regexp : :class:`~pyspark.sql.Column` or column name regex pattern to apply. idx : int, optional Review Comment: ```suggestion idx : :class:`~pyspark.sql.Column` or int, optional ``` ########## python/pyspark/sql/functions/builtin.py: ########## @@ -15341,12 +15341,15 @@ def regexp_count(str: "ColumnOrName", regexp: "ColumnOrName") -> Column: Examples -------- + >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"]) - >>> df.select(regexp_count('str', lit(r'\d+')).alias('d')).collect() + >>> df.select(sf.regexp_count('str', lit(r'\d+')).alias('d')).collect() Review Comment: you might generate the new output of each example with REPL `bin/pyspark`: <img width="1450" alt="image" src="https://github.com/user-attachments/assets/f7cc9cc4-9c7a-406d-8a4a-c5ed399f111c" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org