No because you didn't say that explicitly. Can you share a sample file too?

On Sun, 21 Feb 2016, 14:34 Prathamesh Dharangutte <pratham.d...@gmail.com>
wrote:

> I am using spark 1.4.0 with scala 2.10.4  and 0.3.2 of spark-xml
> Orderid is empty for some books and multiple entries of it for other
> books,did you include ‎that in your xml file?
>
> *From: *Sebastian Piu
> *Sent: *Sunday, 21 February 2016 20:00
> *To: *Prathamesh Dharangutte
> *Cc: *user@spark.apache.org
> *Subject: *Re: spark-xml can't recognize schema
>
> Just ran that code and it works fine, here is the output:
>
> What version are you using?
>
> val ctx = SQLContext.getOrCreate(sc)
> val df = ctx.read.format("com.databricks.spark.xml").option("rowTag", 
> "book").load("file:///tmp/sample.xml")
> df.printSchema()
>
> root
>  |-- name: long (nullable = true)
>  |-- orderId: long (nullable = true)
>  |-- price: long (nullable = true)
>
>
>
> On Sun, Feb 21, 2016 at 2:14 PM Prathamesh Dharangutte <
> pratham.d...@gmail.com> wrote:
>
>> This is the code I am using for parsing xml file:
>>
>>
>>
>> import org.apache.spark.{SparkConf,SparkContext}
>> import org.apache.spark.sql.{DataFrame,SQLContext}
>> import com.databricks.spark.xml
>>
>>
>> object XmlProcessing {
>>
>> def main(args : Array[String]) = {
>>
>>     val conf = new SparkConf()
>>         .setAppName("XmlProcessing")
>>         .setMaster("local")
>>
>>     val sc = new SparkContext(conf)
>>     val sqlContext : SQLContext = new org.apache.spark.sql.SQLContext(sc)
>>
>>     loadXMLdata(sqlContext)
>>
>>     }
>>
>> def loadXMLdata(sqlContext : SQLContext) = {
>>
>>     var df : DataFrame = null
>>
>>     var newDf : DataFrame = null
>>
>>     df = sqlContext.read
>>         .format("com.databricks.spark.xml")
>>         .option("rowTag","book")
>>         .load("/home/prathamsh/Workspace/Xml/datafiles/sample.xml")
>>
>>     df.printSchema()
>>
>>
>>
>>     }
>>
>> }
>>
>>
>>
>>
>>
>>
>> On Sun, Feb 21, 2016 at 7:10 PM, Sebastian Piu <sebastian....@gmail.com>
>> wrote:
>>
>>> Can you paste the code you are using?
>>>
>>> On Sun, 21 Feb 2016, 13:19 Prathamesh Dharangutte <
>>> pratham.d...@gmail.com> wrote:
>>>
>>>> I am trying to parse xml file using spark-xml. But for some reason when
>>>> i print schema it only shows  root instead of the hierarchy. I am using
>>>> sqlcontext to read the data. I am proceeding according to this video :
>>>> https://www.youtube.com/watch?v=NemEp53yGbI
>>>>
>>>> The structure of xml file is somewhat like this:
>>>>
>>>> <books>
>>>>   <book>
>>>>      <name></name>
>>>>      <price></price>
>>>>      <orderId></orderId>
>>>>   </book>
>>>>    <book>
>>>>        //Some more data
>>>>    </book>
>>>> </books>
>>>>
>>>> For some books there,are multiple orders i.e. large number of orders
>>>> while for some it just occurs once as empty. I use the "rowtag" attribute
>>>> as book. How do i proceed or is there any other way to tackle this
>>>> problem?  Help would be much appreciated. Thank you.
>>>>
>>>
>>
>

Reply via email to