I am using spark-xml for loading data and creating a data frame. If xml element has sub elements and values, then it works fine. Example if the xml element is like
<a val="1"> <b>test</b> </a> however if the xml element is bare with just attributes, then it does not work - Any suggestions. <a val="1" /> Does not load the data Any suggestions to fix this On Fri, Jun 17, 2016 at 4:28 PM, Siva A <siva9940261...@gmail.com> wrote: > Use Spark XML version,0.3.3 > <dependency> > <groupId>com.databricks</groupId> > <artifactId>spark-xml_2.10</artifactId> > <version>0.3.3</version> > </dependency> > > On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote: > >> Hi Siva >> >> This is what i have for jars. Did you manage to run with these or >> different versions ? >> >> >> <dependency> >> <groupId>org.apache.spark</groupId> >> <artifactId>spark-core_2.10</artifactId> >> <version>1.6.1</version> >> </dependency> >> <dependency> >> <groupId>org.apache.spark</groupId> >> <artifactId>spark-sql_2.10</artifactId> >> <version>1.6.1</version> >> </dependency> >> <dependency> >> <groupId>com.databricks</groupId> >> <artifactId>spark-xml_2.10</artifactId> >> <version>0.2.0</version> >> </dependency> >> <dependency> >> <groupId>org.scala-lang</groupId> >> <artifactId>scala-library</artifactId> >> <version>2.10.6</version> >> </dependency> >> >> Thanks >> VG >> >> >> On Fri, Jun 17, 2016 at 4:16 PM, Siva A <siva9940261...@gmail.com> wrote: >> >>> Hi Marco, >>> >>> I did run in IDE(Intellij) as well. It works fine. >>> VG, make sure the right jar is in classpath. >>> >>> --Siva >>> >>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com> >>> wrote: >>> >>>> and your eclipse path is correct? >>>> i suggest, as Siva did before, to build your jar and run it via >>>> spark-submit by specifying the --packages option >>>> it's as simple as run this command >>>> >>>> spark-submit --packages >>>> com.databricks:spark-xml_<scalaversion>:<packageversion> --class <Name of >>>> your class containing main> <path to your jar> >>>> >>>> Indeed, if you have only these lines to run, why dont you try them in >>>> spark-shell ? >>>> >>>> hth >>>> >>>> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote: >>>> >>>>> nopes. eclipse. >>>>> >>>>> >>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> >>>>> wrote: >>>>> >>>>>> If you are running from IDE, Are you using Intellij? >>>>>> >>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Can you try to package as a jar and run using spark-submit >>>>>>> >>>>>>> Siva >>>>>>> >>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote: >>>>>>> >>>>>>>> I am trying to run from IDE and everything else is working fine. >>>>>>>> I added spark-xml jar and now I ended up into this dependency >>>>>>>> >>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>>>>>>> scala/collection/GenTraversableOnce$class* >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.<init>(ddl.scala:150) >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>>>>>>> at >>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>>> at >>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>>> Caused by:* java.lang.ClassNotFoundException: >>>>>>>> scala.collection.GenTraversableOnce$class* >>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>> ... 5 more >>>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown >>>>>>>> hook >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni < >>>>>>>> mmistr...@gmail.com> wrote: >>>>>>>> >>>>>>>>> So you are using spark-submit or spark-shell? >>>>>>>>> >>>>>>>>> you will need to launch either by passing --packages option (like >>>>>>>>> in the example below for spark-csv). you will need to iknow >>>>>>>>> >>>>>>>>> --packages com.databricks:spark-xml_<scala.version>:<package >>>>>>>>> version> >>>>>>>>> >>>>>>>>> hth >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Apologies for that. >>>>>>>>>> I am trying to use spark-xml to load data of a xml file. >>>>>>>>>> >>>>>>>>>> here is the exception >>>>>>>>>> >>>>>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager >>>>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: >>>>>>>>>> Failed to find data source: org.apache.spark.xml. Please find >>>>>>>>>> packages at >>>>>>>>>> http://spark-packages.org >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>> org.apache.spark.xml.DefaultSource >>>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>>>>>>> at scala.util.Try$.apply(Try.scala:192) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>>>>>>> at scala.util.Try.orElse(Try.scala:84) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) >>>>>>>>>> ... 4 more >>>>>>>>>> >>>>>>>>>> Code >>>>>>>>>> SQLContext sqlContext = new SQLContext(sc); >>>>>>>>>> DataFrame df = sqlContext.read() >>>>>>>>>> .format("org.apache.spark.xml") >>>>>>>>>> .option("rowTag", "row") >>>>>>>>>> .load("A.xml"); >>>>>>>>>> >>>>>>>>>> Any suggestions please .. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni < >>>>>>>>>> mmistr...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> too little info >>>>>>>>>>> it'll help if you can post the exception and show your sbt file >>>>>>>>>>> (if you are using sbt), and provide minimal details on what you are >>>>>>>>>>> doing >>>>>>>>>>> kr >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Failed to find data source: com.databricks.spark.xml >>>>>>>>>>>> >>>>>>>>>>>> Any suggestions to resolve this >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >