Hi Sreekanth, Assuming you are using Spark 1.x,
I believe this code below: sqlContext.read.format("com.databricks.spark.xml").option("rowTag", "emp").load("/tmp/sample.xml") .selectExpr("manager.id", "manager.name", "explode(manager.subordinates.clerk) as clerk") .selectExpr("id", "name", "clerk.cid", "clerk.cname") .show() would print the results below as you want: +---+----+---+-----+ | id|name|cid|cname| +---+----+---+-----+ | 1| foo| 1| foo| | 1| foo| 1| foo| +---+----+---+-----+ I hope this is helpful. Thanks! 2016-08-13 9:33 GMT+09:00 Sreekanth Jella <srikanth.je...@gmail.com>: > Hi Folks, > > > > I am trying flatten variety of XMLs using DataFrames. I’m using spark-xml > package which is automatically inferring my schema and creating a > DataFrame. > > > > I do not want to hard code any column names in DataFrame as I have lot of > varieties of XML documents and each might be lot more depth of child nodes. > I simply want to flatten any type of XML and then write output data to a > hive table. Can you please give some expert advice for the same. > > > > Example XML and expected output is given below. > > > > Sample XML: > > <emplist> > > <emp> > > <manager> > > <id>1</id> > > <name>foo</name> > > <subordinates> > > <clerk> > > <cid>1</cid> > > <cname>foo</cname> > > </clerk> > > <clerk> > > <cid>1</cid> > > <cname>foo</cname> > > </clerk> > > </subordinates> > > </manager> > > </emp> > > </emplist> > > > > Expected output: > > id, name, clerk.cid, clerk.cname > > 1, foo, 2, cname2 > > 1, foo, 3, cname3 > > > > Thanks, > > Sreekanth Jella > > >