zhengruifeng commented on code in PR #49338:
URL: https://github.com/apache/spark/pull/49338#discussion_r1899981300


##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -15341,12 +15341,15 @@ def regexp_count(str: "ColumnOrName", regexp: 
"ColumnOrName") -> Column:
 
     Examples
     --------
+    >>> from pyspark.sql import functions as sf
     >>> df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"])
-    >>> df.select(regexp_count('str', lit(r'\d+')).alias('d')).collect()
+    >>> df.select(sf.regexp_count('str', lit(r'\d+')).alias('d')).collect()

Review Comment:
   let's make such changes:
   1, replace `collect` with `show()`, and discard `alias('d')` to also test 
the default output column name;
   2, also `show` the input columns by adding `'*'`;
   3, also add `sf.` for other built-in PySpark functions, e.g. `lit`;
   
   ```suggestion
       >>> df.select('*', sf.regexp_count('str', sf.lit(r'\d+'))).show()
   ```
   
   please also make such changes in other places.



##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -15406,9 +15414,9 @@ def regexp_extract_all(
 
     Parameters
     ----------
-    str : :class:`~pyspark.sql.Column` or str
+    str : :class:`~pyspark.sql.Column` or column name
         target column to work on.
-    regexp : :class:`~pyspark.sql.Column` or str
+    regexp : :class:`~pyspark.sql.Column` or column name
         regex pattern to apply.
     idx : int, optional

Review Comment:
   ```suggestion
       idx : :class:`~pyspark.sql.Column` or int, optional
   ```



##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -15341,12 +15341,15 @@ def regexp_count(str: "ColumnOrName", regexp: 
"ColumnOrName") -> Column:
 
     Examples
     --------
+    >>> from pyspark.sql import functions as sf
     >>> df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"])
-    >>> df.select(regexp_count('str', lit(r'\d+')).alias('d')).collect()
+    >>> df.select(sf.regexp_count('str', lit(r'\d+')).alias('d')).collect()

Review Comment:
   you might generate the new output of each example with REPL `bin/pyspark`:
   
   <img width="1450" alt="image" 
src="https://github.com/user-attachments/assets/f7cc9cc4-9c7a-406d-8a4a-c5ed399f111c";
 />
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to