[ 
https://issues.apache.org/jira/browse/SPARK-16324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16324:
------------------------------
    Issue Type: Improvement  (was: Bug)
       Summary: regexp_extract should doc that it returns empty string when 
match fails  (was: regexp_extract returns empty string when match fails)

> regexp_extract should doc that it returns empty string when match fails
> -----------------------------------------------------------------------
>
>                 Key: SPARK-16324
>                 URL: https://issues.apache.org/jira/browse/SPARK-16324
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Max Moroz
>            Priority: Minor
>
> The documentation for regexp_extract isn't clear about how it should behave 
> if the regex didn't match the row. However, the Java documentation it refers 
> for further detail suggests that the return value should be null if the group 
> wasn't matched at all, empty string is the group actually matched empty 
> string, and an exception raised if the entire regex didn't match.
> This would be identical to how python's own re module behaves when a 
> MatchObject.group() is called.
> However, in practice regexp_extract() returns empty string when the match 
> fails. This seems to be a bug; if it was intended as a feature, it should 
> have been documented as such - and it was probably not a good idea since it 
> can result in silent bugs.
> {code}
> import pyspark.sql.functions as F
> df = spark.createDataFrame([['abc']], ['text'])
> assert df.select(F.regexp_extract('text', r'(z)', 1)).first()[0] == ''
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to