You can remove the <1000> first and then turn the string into a map
(interpret the string as key-values). From that map you can access each
key and turn it into a separate column:
Seq(("<1000> date=2020-08-01 time=20:50:04 name=processing id=123
session=new packt=20 orgin=null address=null dest=fgjglgl"))
.toDF("string")
.withColumn("key-values", regexp_replace($"string", "^[^ ]+ ", ""))
.withColumn("map", expr("str_to_map(`key-values`, ' ', '=')"))
.select(
$"map"("date").as("date"),
$"map"("time").as("time"),
$"map"("name").as("name"),
$"map"("id").as("id"),
$"map"("session").as("session"),
$"map"("packt").as("packt"),
$"map"("origin").as("origin"),
$"map"("address").as("address"),
$"map"("dest").as("dest")
)
.show(false)
Enrico
Am 09.08.20 um 18:00 schrieb anbutech:
Hi All,
I have a following info.in the data column.
<1000> date=2020-08-01 time=20:50:04 name=processing id=123 session=new
packt=20 orgin=null address=null dest=fgjglgl
here I want to create a separate column for the above key value pairs after
the integer <1000> separated by spaces.
Is there any way to achieved it using regexp_extract inbuilt functions.i
don't want to do it using udf function.
apart from udf,is there any way to achieved it.
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org