Hi Khalid, See https://issues.apache.org/jira/browse/SPARK-31628.
It might just be a syntactic sugar over the StringReplace<https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L662> class, but it makes the things a little easier and neater. There are a lot of such missing APIs in scala and python. Regards, Vibhor ________________________________ From: russell.spit...@gmail.com <russell.spit...@gmail.com> Sent: Monday, October 3, 2022 12:31 AM To: Khalid Mammadov <khalidmammad...@gmail.com> Cc: dev <dev@spark.apache.org> Subject: EXT: Re: Missing string replace function EXTERNAL: Report suspicious emails to Email Abuse. Ah for that I think it makes sense to add in a function but it probably should not be an alias for regex replace since that has very different semantics for certain string arguments Sent from my iPhone On Oct 2, 2022, at 1:31 PM, Khalid Mammadov <khalidmammad...@gmail.com> wrote: Thanks Russell for checking this out! This is a good example of a replace which is available in the Sapk SQL but not in the PySpark API nor in Scala API unfortunately. Another alternative to this is mentioned regexp_replace, but as a developer looking for replace function we tend to ignore regex version as it's not what we usually look for and then realise there is not built in replace utility function and have to use regexp alternative. So, to give an example, it is possible now to do something like this: scala> val df = Seq("aaa zzz").toDF df: org.apache.spark.sql.DataFrame = [value: string] scala> df.select(expr("replace(value, 'aaa', 'bbb')")).show() +------------------------+ |replace(value, aaa, bbb)| +------------------------+ | bbb zzz| +------------------------+ But not this: df.select(replace('value, "aaa", "ooo")).show() as replace function is not available in functions modules both PySpark and Scala. And this is the output from my local prototype which would be good to see in the official API: scala> df.select(replace('value, "aaa", "ooo")).show() +----------------------------------+ |regexp_replace(value, aaa, ooo, 1)| +----------------------------------+ | ooo zzz| +----------------------------------+ WDYT? On Sun, Oct 2, 2022 at 6:24 PM Russell Spitzer <russell.spit...@gmail.com<mailto:russell.spit...@gmail.com>> wrote: Quick test on on 3.2 confirms everything should be working as expected scala> spark.createDataset(Seq(("foo", "bar"))) res0: org.apache.spark.sql.Dataset[(String, String)] = [_1: string, _2: string] scala> spark.createDataset(Seq(("foo", "bar"))).createTempView("temp") scala> spark.sql("SELECT replace(_1, 'fo', 'bo') from temp").show +-------------------+ |replace(_1, fo, bo)| +-------------------+ | boo| +-------------------+ On Oct 2, 2022, at 12:21 PM, Russell Spitzer <russell.spit...@gmail.com<mailto:russell.spit...@gmail.com>> wrote: https://spark.apache.org/docs/3.3.0/api/sql/index.html#replace<https://urldefense.com/v3/__https://spark.apache.org/docs/3.3.0/api/sql/index.html*replace__;Iw!!IfjTnhH9!U4BTEWChXelKPQe2un8hu8QiB9u1eS7pWoYfFCBA3me4QiZtfw8sB43FdMCVBsj9ErZMm1Q6Kj_Pkck5J7BmPpRqFjhv2A$> This was added in Spark 2.3.0 as far as I can tell. https://github.com/apache/spark/pull/18047<https://urldefense.com/v3/__https://github.com/apache/spark/pull/18047__;!!IfjTnhH9!U4BTEWChXelKPQe2un8hu8QiB9u1eS7pWoYfFCBA3me4QiZtfw8sB43FdMCVBsj9ErZMm1Q6Kj_Pkck5J7BmPpTOJ773oA$> On Oct 2, 2022, at 11:19 AM, Khalid Mammadov <khalidmammad...@gmail.com<mailto:khalidmammad...@gmail.com>> wrote: Hi, As you know there's no string "replace" function inside pyspark.sql.functions for PySpark nor in org.apache.sql.functions for Scala/Java and was wondering why is that so? And I know there's regexp_replace instead and na.replace or SQL with expr. I think it's one of the fundamental functions in users/developers toolset and available almost in every language. It takes time for new Spark devs to realise it's not there and to use alternative ones. So, I think it would be nice to have one. I had already got a prototype for Scala (which is just a sugar over regexp_replace) and works like a charm:) Would like to know your opinion to contribute or not needed... Thanks Khalid