No because you didn't say that explicitly. Can you share a sample file too?
On Sun, 21 Feb 2016, 14:34 Prathamesh Dharangutte <pratham.d...@gmail.com> wrote: > I am using spark 1.4.0 with scala 2.10.4 and 0.3.2 of spark-xml > Orderid is empty for some books and multiple entries of it for other > books,did you include that in your xml file? > > *From: *Sebastian Piu > *Sent: *Sunday, 21 February 2016 20:00 > *To: *Prathamesh Dharangutte > *Cc: *user@spark.apache.org > *Subject: *Re: spark-xml can't recognize schema > > Just ran that code and it works fine, here is the output: > > What version are you using? > > val ctx = SQLContext.getOrCreate(sc) > val df = ctx.read.format("com.databricks.spark.xml").option("rowTag", > "book").load("file:///tmp/sample.xml") > df.printSchema() > > root > |-- name: long (nullable = true) > |-- orderId: long (nullable = true) > |-- price: long (nullable = true) > > > > On Sun, Feb 21, 2016 at 2:14 PM Prathamesh Dharangutte < > pratham.d...@gmail.com> wrote: > >> This is the code I am using for parsing xml file: >> >> >> >> import org.apache.spark.{SparkConf,SparkContext} >> import org.apache.spark.sql.{DataFrame,SQLContext} >> import com.databricks.spark.xml >> >> >> object XmlProcessing { >> >> def main(args : Array[String]) = { >> >> val conf = new SparkConf() >> .setAppName("XmlProcessing") >> .setMaster("local") >> >> val sc = new SparkContext(conf) >> val sqlContext : SQLContext = new org.apache.spark.sql.SQLContext(sc) >> >> loadXMLdata(sqlContext) >> >> } >> >> def loadXMLdata(sqlContext : SQLContext) = { >> >> var df : DataFrame = null >> >> var newDf : DataFrame = null >> >> df = sqlContext.read >> .format("com.databricks.spark.xml") >> .option("rowTag","book") >> .load("/home/prathamsh/Workspace/Xml/datafiles/sample.xml") >> >> df.printSchema() >> >> >> >> } >> >> } >> >> >> >> >> >> >> On Sun, Feb 21, 2016 at 7:10 PM, Sebastian Piu <sebastian....@gmail.com> >> wrote: >> >>> Can you paste the code you are using? >>> >>> On Sun, 21 Feb 2016, 13:19 Prathamesh Dharangutte < >>> pratham.d...@gmail.com> wrote: >>> >>>> I am trying to parse xml file using spark-xml. But for some reason when >>>> i print schema it only shows root instead of the hierarchy. I am using >>>> sqlcontext to read the data. I am proceeding according to this video : >>>> https://www.youtube.com/watch?v=NemEp53yGbI >>>> >>>> The structure of xml file is somewhat like this: >>>> >>>> <books> >>>> <book> >>>> <name></name> >>>> <price></price> >>>> <orderId></orderId> >>>> </book> >>>> <book> >>>> //Some more data >>>> </book> >>>> </books> >>>> >>>> For some books there,are multiple orders i.e. large number of orders >>>> while for some it just occurs once as empty. I use the "rowtag" attribute >>>> as book. How do i proceed or is there any other way to tackle this >>>> problem? Help would be much appreciated. Thank you. >>>> >>> >> >