I can't say this is the best way to do so but my instant thought is as
below:
Create two df
sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY, s"")
sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, s"")
sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "UTF-8")
val strXmlDf = sc
Hello Experts,
I’m using spark-xml package which is automatically inferring my schema and
creating a DataFrame.
I’m extracting few fields like id, name (which are unique) from below xml, but
my requirement is to store entire XML in one of the column as well. I’m writing
this data to AVRO hive