I am using spark 1.4.0 with scala 2.10.4  and 0.3.2 of spark-xml
Orderid is empty for some books and multiple entries of it for other books,did you include ‎that in your xml file?

From: Sebastian Piu
Sent: Sunday, 21 February 2016 20:00
To: Prathamesh Dharangutte
Cc: user@spark.apache.org
Subject: Re: spark-xml can't recognize schema

Just ran that code and it works fine, here is the output:

What version are you using?

val ctx = SQLContext.getOrCreate(sc)
val df = ctx.read.format("com.databricks.spark.xml").option("rowTag", "book").load("file:///tmp/sample.xml")
df.printSchema()
root
 |-- name: long (nullable = true)
 |-- orderId: long (nullable = true)
 |-- price: long (nullable = true)



On Sun, Feb 21, 2016 at 2:14 PM Prathamesh Dharangutte <pratham.d...@gmail.com> wrote:
This is the code I am using for parsing xml file:



import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.sql.{DataFrame,SQLContext}
import com.databricks.spark.xml


object XmlProcessing {

def main(args : Array[String]) = {

    val conf = new SparkConf()
        .setAppName("XmlProcessing")
        .setMaster("local")

    val sc = new SparkContext(conf)
    val sqlContext : SQLContext = new org.apache.spark.sql.SQLContext(sc)
   
    loadXMLdata(sqlContext)       
   
    }

def loadXMLdata(sqlContext : SQLContext) = {

    var df : DataFrame = null
   
    var newDf : DataFrame = null

    df = sqlContext.read
        .format("com.databricks.spark.xml")
        .option("rowTag","book")
        .load("/home/prathamsh/Workspace/Xml/datafiles/sample.xml")   
       
    df.printSchema()

   
   
    }

}






On Sun, Feb 21, 2016 at 7:10 PM, Sebastian Piu <sebastian....@gmail.com> wrote:

Can you paste the code you are using?


On Sun, 21 Feb 2016, 13:19 Prathamesh Dharangutte <pratham.d...@gmail.com> wrote:
I am trying to parse xml file using spark-xml. But for some reason when i print schema it only shows  root instead of the hierarchy. I am using sqlcontext to read the data. I am proceeding according to this video :    https://www.youtube.com/watch?v=NemEp53yGbI

The structure of xml file is somewhat like this:

<books>
  <book>
     <name></name>
     <price></price>
     <orderId></orderId>
  </book> 
   <book>
       //Some more data
   </book>
</books>

For some books there,are multiple orders i.e. large number of orders while for some it just occurs once as empty. I use the "rowtag" attribute as book. How do i proceed or is there any other way to tackle this problem?  Help would be much appreciated. Thank you.



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to