Just ran that code and it works fine, here is the output: What version are you using?
val ctx = SQLContext.getOrCreate(sc) val df = ctx.read.format("com.databricks.spark.xml").option("rowTag", "book").load("file:///tmp/sample.xml") df.printSchema() root |-- name: long (nullable = true) |-- orderId: long (nullable = true) |-- price: long (nullable = true) On Sun, Feb 21, 2016 at 2:14 PM Prathamesh Dharangutte < pratham.d...@gmail.com> wrote: > This is the code I am using for parsing xml file: > > > > import org.apache.spark.{SparkConf,SparkContext} > import org.apache.spark.sql.{DataFrame,SQLContext} > import com.databricks.spark.xml > > > object XmlProcessing { > > def main(args : Array[String]) = { > > val conf = new SparkConf() > .setAppName("XmlProcessing") > .setMaster("local") > > val sc = new SparkContext(conf) > val sqlContext : SQLContext = new org.apache.spark.sql.SQLContext(sc) > > loadXMLdata(sqlContext) > > } > > def loadXMLdata(sqlContext : SQLContext) = { > > var df : DataFrame = null > > var newDf : DataFrame = null > > df = sqlContext.read > .format("com.databricks.spark.xml") > .option("rowTag","book") > .load("/home/prathamsh/Workspace/Xml/datafiles/sample.xml") > > df.printSchema() > > > > } > > } > > > > > > > On Sun, Feb 21, 2016 at 7:10 PM, Sebastian Piu <sebastian....@gmail.com> > wrote: > >> Can you paste the code you are using? >> >> On Sun, 21 Feb 2016, 13:19 Prathamesh Dharangutte <pratham.d...@gmail.com> >> wrote: >> >>> I am trying to parse xml file using spark-xml. But for some reason when >>> i print schema it only shows root instead of the hierarchy. I am using >>> sqlcontext to read the data. I am proceeding according to this video : >>> https://www.youtube.com/watch?v=NemEp53yGbI >>> >>> The structure of xml file is somewhat like this: >>> >>> <books> >>> <book> >>> <name></name> >>> <price></price> >>> <orderId></orderId> >>> </book> >>> <book> >>> //Some more data >>> </book> >>> </books> >>> >>> For some books there,are multiple orders i.e. large number of orders >>> while for some it just occurs once as empty. I use the "rowtag" attribute >>> as book. How do i proceed or is there any other way to tackle this >>> problem? Help would be much appreciated. Thank you. >>> >> >