You can remove the <1000> first and then turn the string into a map (interpret the string as key-values). From that map you can access each key and turn it into a separate column:

Seq(("<1000> date=2020-08-01 time=20:50:04 name=processing id=123 session=new packt=20 orgin=null address=null dest=fgjglgl"))
  .toDF("string")
  .withColumn("key-values", regexp_replace($"string", "^[^ ]+ ", ""))
  .withColumn("map", expr("str_to_map(`key-values`, ' ', '=')"))
  .select(
    $"map"("date").as("date"),
    $"map"("time").as("time"),
    $"map"("name").as("name"),
    $"map"("id").as("id"),
    $"map"("session").as("session"),
    $"map"("packt").as("packt"),
    $"map"("origin").as("origin"),
    $"map"("address").as("address"),
    $"map"("dest").as("dest")
  )
  .show(false)

Enrico


Am 09.08.20 um 18:00 schrieb anbutech:
Hi All,

I have a following info.in the data column.

<1000> date=2020-08-01 time=20:50:04 name=processing id=123 session=new
packt=20 orgin=null address=null dest=fgjglgl

here I want to create a separate column for the above key value pairs after
the integer <1000> separated by spaces.
Is there any way to achieved it using regexp_extract inbuilt functions.i
don't want to do it using udf function.
apart from udf,is there any way to achieved it.


Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to