Make sure the xml input file is well formed (check your end tags). Sent from my iPhone
> On Feb 21, 2016, at 8:14 AM, Prathamesh Dharangutte <pratham.d...@gmail.com> > wrote: > > This is the code I am using for parsing xml file: > > > > import org.apache.spark.{SparkConf,SparkContext} > import org.apache.spark.sql.{DataFrame,SQLContext} > import com.databricks.spark.xml > > > object XmlProcessing { > > def main(args : Array[String]) = { > > val conf = new SparkConf() > .setAppName("XmlProcessing") > .setMaster("local") > > val sc = new SparkContext(conf) > val sqlContext : SQLContext = new org.apache.spark.sql.SQLContext(sc) > > loadXMLdata(sqlContext) > > } > > def loadXMLdata(sqlContext : SQLContext) = { > > var df : DataFrame = null > > var newDf : DataFrame = null > > df = sqlContext.read > .format("com.databricks.spark.xml") > .option("rowTag","book") > .load("/home/prathamsh/Workspace/Xml/datafiles/sample.xml") > > df.printSchema() > > > } > > } > > > > > > >> On Sun, Feb 21, 2016 at 7:10 PM, Sebastian Piu <sebastian....@gmail.com> >> wrote: >> Can you paste the code you are using? >> >> >>> On Sun, 21 Feb 2016, 13:19 Prathamesh Dharangutte <pratham.d...@gmail.com> >>> wrote: >>> I am trying to parse xml file using spark-xml. But for some reason when i >>> print schema it only shows root instead of the hierarchy. I am using >>> sqlcontext to read the data. I am proceeding according to this video : >>> https://www.youtube.com/watch?v=NemEp53yGbI >>> >>> The structure of xml file is somewhat like this: >>> >>> <books> >>> <book> >>> <name></name> >>> <price></price> >>> <orderId></orderId> >>> </book> >>> <book> >>> //Some more data >>> </book> >>> </books> >>> >>> For some books there,are multiple orders i.e. large number of orders while >>> for some it just occurs once as empty. I use the "rowtag" attribute as >>> book. How do i proceed or is there any other way to tackle this problem? >>> Help would be much appreciated. Thank you. >