Thanks Russell for checking this out! This is a good example of a *replace* which is available in the Sapk SQL but not in the PySpark API nor in Scala API unfortunately. Another alternative to this is mentioned regexp_replace, but as a developer looking for *replace* function we tend to ignore regex version as it's not what we usually look for and then realise there is not built in replace utility function and have to use regexp alternative.
So, to give an example, it is possible now to do something like this: > scala> val df = Seq("aaa zzz").toDF > df: org.apache.spark.sql.DataFrame = [value: string] > scala> df.select(expr("replace(value, 'aaa', 'bbb')")).show() > +------------------------+ > |replace(value, aaa, bbb)| > +------------------------+ > | bbb zzz| > +------------------------+ > But not this: > df.select(replace('value, "aaa", "ooo")).show() > as *replace* function is not available in functions modules both PySpark and Scala. And this is the output from my local prototype which would be good to see in the official API: > scala> df.select(replace('value, "aaa", "ooo")).show() > +----------------------------------+ > |regexp_replace(value, aaa, ooo, 1)| > +----------------------------------+ > | ooo zzz| > +----------------------------------+ > WDYT? On Sun, Oct 2, 2022 at 6:24 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > Quick test on on 3.2 confirms everything should be working as expected > > scala> spark.createDataset(Seq(("foo", "bar"))) > res0: org.apache.spark.sql.Dataset[(String, String)] = [_1: string, _2: > string] > > scala> spark.createDataset(Seq(("foo", "bar"))).createTempView("temp") > > scala> spark.sql("SELECT replace(_1, 'fo', 'bo') from temp").show > +-------------------+ > |replace(_1, fo, bo)| > +-------------------+ > | boo| > +-------------------+ > > On Oct 2, 2022, at 12:21 PM, Russell Spitzer <russell.spit...@gmail.com> > wrote: > > https://spark.apache.org/docs/3.3.0/api/sql/index.html#replace > > This was added in Spark 2.3.0 as far as I can tell. > > https://github.com/apache/spark/pull/18047 > > On Oct 2, 2022, at 11:19 AM, Khalid Mammadov <khalidmammad...@gmail.com> > wrote: > > Hi, > > As you know there's no string "replace" function inside > pyspark.sql.functions for PySpark nor in org.apache.sql.functions for > Scala/Java and was wondering why is that so? And I know there's > regexp_replace instead and na.replace or SQL with expr. > > I think it's one of the fundamental functions in users/developers toolset > and available almost in every language. It takes time for new Spark devs to > realise it's not there and to use alternative ones. So, I think it would be > nice to have one. > I had already got a prototype for Scala (which is just a sugar over > regexp_replace) and works like a charm:) > > Would like to know your opinion to contribute or not needed... > > Thanks > Khalid > > > >