Hi, I have a raw source data frame having 2 columns as below
timestamp 2019-11-29 9:30:45 message_log <123>NOV 29 10:20:35 ips01 sfids: connection: tcp,bytes:104,user:unknown,url:unknown,host:127.0.0.1 how do we break above each key value as separate columns using udf in pyspark? what is the right approach for flattening this type of log data - regex or python logic? Could you please help me the logic to bring flattening the log data? Final output dataframe having the below each columns: timestamp 2019-11-29 9:30:45 prio 123 msg_ts NOV 29 10:20:35 msg_ids ips01 sfids connection tcp bytes 104 user unknown url unknown host 127.0.0.1 Thanks Anbu -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org