Great.. thanks for pointing this out.
On Fri, Jun 17, 2016 at 6:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Please see https://github.com/databricks/spark-xml/issues/92 > > On Fri, Jun 17, 2016 at 5:19 AM, VG <vlin...@gmail.com> wrote: > >> I am using spark-xml for loading data and creating a data frame. >> >> If xml element has sub elements and values, then it works fine. Example >> if the xml element is like >> >> <a val="1"> >> <b>test</b> >> </a> >> >> however if the xml element is bare with just attributes, then it does not >> work - Any suggestions. >> <a val="1" /> Does not load the data >> >> >> >> Any suggestions to fix this >> >> >> >> >> >> >> On Fri, Jun 17, 2016 at 4:28 PM, Siva A <siva9940261...@gmail.com> wrote: >> >>> Use Spark XML version,0.3.3 >>> <dependency> >>> <groupId>com.databricks</groupId> >>> <artifactId>spark-xml_2.10</artifactId> >>> <version>0.3.3</version> >>> </dependency> >>> >>> On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote: >>> >>>> Hi Siva >>>> >>>> This is what i have for jars. Did you manage to run with these or >>>> different versions ? >>>> >>>> >>>> <dependency> >>>> <groupId>org.apache.spark</groupId> >>>> <artifactId>spark-core_2.10</artifactId> >>>> <version>1.6.1</version> >>>> </dependency> >>>> <dependency> >>>> <groupId>org.apache.spark</groupId> >>>> <artifactId>spark-sql_2.10</artifactId> >>>> <version>1.6.1</version> >>>> </dependency> >>>> <dependency> >>>> <groupId>com.databricks</groupId> >>>> <artifactId>spark-xml_2.10</artifactId> >>>> <version>0.2.0</version> >>>> </dependency> >>>> <dependency> >>>> <groupId>org.scala-lang</groupId> >>>> <artifactId>scala-library</artifactId> >>>> <version>2.10.6</version> >>>> </dependency> >>>> >>>> Thanks >>>> VG >>>> >>>> >>>> On Fri, Jun 17, 2016 at 4:16 PM, Siva A <siva9940261...@gmail.com> >>>> wrote: >>>> >>>>> Hi Marco, >>>>> >>>>> I did run in IDE(Intellij) as well. It works fine. >>>>> VG, make sure the right jar is in classpath. >>>>> >>>>> --Siva >>>>> >>>>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com> >>>>> wrote: >>>>> >>>>>> and your eclipse path is correct? >>>>>> i suggest, as Siva did before, to build your jar and run it via >>>>>> spark-submit by specifying the --packages option >>>>>> it's as simple as run this command >>>>>> >>>>>> spark-submit --packages >>>>>> com.databricks:spark-xml_<scalaversion>:<packageversion> --class <Name >>>>>> of >>>>>> your class containing main> <path to your jar> >>>>>> >>>>>> Indeed, if you have only these lines to run, why dont you try them in >>>>>> spark-shell ? >>>>>> >>>>>> hth >>>>>> >>>>>> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote: >>>>>> >>>>>>> nopes. eclipse. >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> If you are running from IDE, Are you using Intellij? >>>>>>>> >>>>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Can you try to package as a jar and run using spark-submit >>>>>>>>> >>>>>>>>> Siva >>>>>>>>> >>>>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I am trying to run from IDE and everything else is working fine. >>>>>>>>>> I added spark-xml jar and now I ended up into this dependency >>>>>>>>>> >>>>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>>>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>>>>>>>>> scala/collection/GenTraversableOnce$class* >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.<init>(ddl.scala:150) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>>>>> Caused by:* java.lang.ClassNotFoundException: >>>>>>>>>> scala.collection.GenTraversableOnce$class* >>>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>>>> ... 5 more >>>>>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from >>>>>>>>>> shutdown hook >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni < >>>>>>>>>> mmistr...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> So you are using spark-submit or spark-shell? >>>>>>>>>>> >>>>>>>>>>> you will need to launch either by passing --packages option >>>>>>>>>>> (like in the example below for spark-csv). you will need to iknow >>>>>>>>>>> >>>>>>>>>>> --packages com.databricks:spark-xml_<scala.version>:<package >>>>>>>>>>> version> >>>>>>>>>>> >>>>>>>>>>> hth >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Apologies for that. >>>>>>>>>>>> I am trying to use spark-xml to load data of a xml file. >>>>>>>>>>>> >>>>>>>>>>>> here is the exception >>>>>>>>>>>> >>>>>>>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered >>>>>>>>>>>> BlockManager >>>>>>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: >>>>>>>>>>>> Failed to find data source: org.apache.spark.xml. Please find >>>>>>>>>>>> packages at >>>>>>>>>>>> http://spark-packages.org >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>>>> org.apache.spark.xml.DefaultSource >>>>>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>>>>>>>>> at scala.util.Try$.apply(Try.scala:192) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>>>>>>>>> at scala.util.Try.orElse(Try.scala:84) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) >>>>>>>>>>>> ... 4 more >>>>>>>>>>>> >>>>>>>>>>>> Code >>>>>>>>>>>> SQLContext sqlContext = new SQLContext(sc); >>>>>>>>>>>> DataFrame df = sqlContext.read() >>>>>>>>>>>> .format("org.apache.spark.xml") >>>>>>>>>>>> .option("rowTag", "row") >>>>>>>>>>>> .load("A.xml"); >>>>>>>>>>>> >>>>>>>>>>>> Any suggestions please .. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni < >>>>>>>>>>>> mmistr...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> too little info >>>>>>>>>>>>> it'll help if you can post the exception and show your sbt >>>>>>>>>>>>> file (if you are using sbt), and provide minimal details on what >>>>>>>>>>>>> you are >>>>>>>>>>>>> doing >>>>>>>>>>>>> kr >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Failed to find data source: com.databricks.spark.xml >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any suggestions to resolve this >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >