Hi Khalid,

See https://issues.apache.org/jira/browse/SPARK-31628.

It might just be a syntactic sugar over the 
StringReplace<https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L662>
 class, but it makes the things a little easier and neater.

There are a lot of such missing APIs in scala and python.

Regards,
Vibhor


________________________________
From: russell.spit...@gmail.com <russell.spit...@gmail.com>
Sent: Monday, October 3, 2022 12:31 AM
To: Khalid Mammadov <khalidmammad...@gmail.com>
Cc: dev <dev@spark.apache.org>
Subject: EXT: Re: Missing string replace function

EXTERNAL: Report suspicious emails to Email Abuse.

Ah for that I think it makes sense to add in a function but it probably should 
not be an alias for regex replace since that has very different semantics for 
certain string arguments

Sent from my iPhone

On Oct 2, 2022, at 1:31 PM, Khalid Mammadov <khalidmammad...@gmail.com> wrote:


Thanks Russell for checking this out!

This is a good example of a replace which is available in the Sapk SQL but not 
in the PySpark API nor in Scala API unfortunately.
Another alternative to this is mentioned regexp_replace, but as a developer 
looking for replace function we tend to ignore regex version as it's not what 
we usually look for and then realise there is not built in replace utility 
function and have to use regexp alternative.

So, to give an example, it is possible now to do something like this:
scala> val df = Seq("aaa zzz").toDF
df: org.apache.spark.sql.DataFrame = [value: string]
scala> df.select(expr("replace(value, 'aaa', 'bbb')")).show()
+------------------------+
|replace(value, aaa, bbb)|
+------------------------+
|                 bbb zzz|
+------------------------+

But not this:
df.select(replace('value, "aaa", "ooo")).show()
as replace function is not available in functions modules both PySpark and 
Scala.

And this is the output from my local prototype which would be good to see in 
the official API:
scala> df.select(replace('value, "aaa", "ooo")).show()
+----------------------------------+
|regexp_replace(value, aaa, ooo, 1)|
+----------------------------------+
|                           ooo zzz|
+----------------------------------+

WDYT?


On Sun, Oct 2, 2022 at 6:24 PM Russell Spitzer 
<russell.spit...@gmail.com<mailto:russell.spit...@gmail.com>> wrote:
Quick test on on 3.2 confirms everything should be working as expected

scala> spark.createDataset(Seq(("foo", "bar")))
res0: org.apache.spark.sql.Dataset[(String, String)] = [_1: string, _2: string]

scala> spark.createDataset(Seq(("foo", "bar"))).createTempView("temp")

scala> spark.sql("SELECT replace(_1, 'fo', 'bo') from temp").show
+-------------------+
|replace(_1, fo, bo)|
+-------------------+
|                boo|
+-------------------+

On Oct 2, 2022, at 12:21 PM, Russell Spitzer 
<russell.spit...@gmail.com<mailto:russell.spit...@gmail.com>> wrote:

https://spark.apache.org/docs/3.3.0/api/sql/index.html#replace<https://urldefense.com/v3/__https://spark.apache.org/docs/3.3.0/api/sql/index.html*replace__;Iw!!IfjTnhH9!U4BTEWChXelKPQe2un8hu8QiB9u1eS7pWoYfFCBA3me4QiZtfw8sB43FdMCVBsj9ErZMm1Q6Kj_Pkck5J7BmPpRqFjhv2A$>

This was added in Spark 2.3.0 as far as I can tell.

https://github.com/apache/spark/pull/18047<https://urldefense.com/v3/__https://github.com/apache/spark/pull/18047__;!!IfjTnhH9!U4BTEWChXelKPQe2un8hu8QiB9u1eS7pWoYfFCBA3me4QiZtfw8sB43FdMCVBsj9ErZMm1Q6Kj_Pkck5J7BmPpTOJ773oA$>

On Oct 2, 2022, at 11:19 AM, Khalid Mammadov 
<khalidmammad...@gmail.com<mailto:khalidmammad...@gmail.com>> wrote:

Hi,

As you know there's no string "replace" function inside pyspark.sql.functions 
for PySpark nor in org.apache.sql.functions for Scala/Java and was wondering 
why is that so? And I know there's regexp_replace instead and na.replace or SQL 
with expr.

I think it's one of the fundamental functions in users/developers toolset and 
available almost in every language. It takes time for new Spark devs to realise 
it's not there and to use alternative ones. So, I think it would be nice to 
have one.
I had already got a prototype for Scala (which is just a sugar over 
regexp_replace) and works like a charm:)

Would like to know your opinion to contribute or not needed...

Thanks
Khalid



Reply via email to