Different versions of dependencies in assembly/target/scala-2.11/jars?
Hi, Just noticed in assembly/target/scala-2.11/jars that similar libraries have different versions: -rw-r--r-- 1 jacek staff 1230201 17 wrz 09:51 netty-3.8.0.Final.jar -rw-r--r-- 1 jacek staff 2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar and -rw-r--r-- 1 jacek staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar -rw-r--r-- 1 jacek staff 2796935 17 wrz 09:51 parquet-hadoop-bundle-1.6.0.jar and -rw-r--r-- 1 jacek staff 46983 17 wrz 09:51 jackson-annotations-2.6.5.jar -rw-r--r-- 1 jacek staff258876 17 wrz 09:51 jackson-core-2.6.5.jar -rw-r--r-- 1 jacek staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar -rw-r--r-- 1 jacek staff 1171380 17 wrz 09:51 jackson-databind-2.6.5.jar -rw-r--r-- 1 jacek staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar -rw-r--r-- 1 jacek staff780664 17 wrz 09:51 jackson-mapper-asl-1.9.13.jar -rw-r--r-- 1 jacek staff 41263 17 wrz 09:51 jackson-module-paranamer-2.6.5.jar -rw-r--r-- 1 jacek staff515604 17 wrz 09:51 jackson-module-scala_2.11-2.6.5.jar -rw-r--r-- 1 jacek staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar and -rw-r--r-- 1 jacek staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar -rw-r--r-- 1 jacek staff206035 17 wrz 09:51 commons-beanutils-core-1.8.0.jar and -rw-r--r-- 1 jacek staff445288 17 wrz 09:51 antlr-2.7.7.jar -rw-r--r-- 1 jacek staff164368 17 wrz 09:51 antlr-runtime-3.4.jar -rw-r--r-- 1 jacek staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar Even if that does not cause any class mismatches, it might be worth to exclude them to minimize the size of the Spark distro. What do you think? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Fwd: Question regarding merging to two RDDs
Hi, I have two separate csv files one with header and other with data. I read those two files in 2 different RDDs and now I need to merge both the RDDs. I tried various options such as union, zip, join but none worked for my problem. What is the best way to merge two RDDs so that the header and data are merged into new RDD with header and data? Thanks, Hiral Mehta
java.lang.NoClassDefFoundError, is this a bug?
Hi, In my application, I got a weird error message: java.lang.NoClassDefFoundError: Could not initialize class X This happens only when I try to submit my application in cluster mode. It works perfectly in client mode. I'm able to reproduce this error message by a simple 16-line program: https://github.com/zasdfgbnm/spark-test1/blob/master/src/main/scala/test.scala To reproduce it, simply clone this git repo, and then execute command like: sbt package && spark-submit --master spark://localhost:7077 target/scala-2.11/createdataset_2.11-0.0.1-SNAPSHOT.jar Can anyone check whether this is a bug of spark? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Different versions of dependencies in assembly/target/scala-2.11/jars?
No, these are different major versions of these components, each of which gets used by something in the transitive dependency graph. They are not redundant because they're not actually presenting roughly the same component in the same namespace. However the parquet-hadoop bit looks wrong, in that it should be harmonized to one 1.x version. It's not that Spark uses inconsistent versions but that transitive deps do. We can still harmonize them in the build if it causes problems. On Sat, Sep 17, 2016 at 8:14 PM, Jacek Laskowski wrote: > Hi, > > Just noticed in assembly/target/scala-2.11/jars that similar libraries > have different versions: > > -rw-r--r-- 1 jacek staff 1230201 17 wrz 09:51 netty-3.8.0.Final.jar > -rw-r--r-- 1 jacek staff 2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar > > and > > -rw-r--r-- 1 jacek staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar > -rw-r--r-- 1 jacek staff 2796935 17 wrz 09:51 > parquet-hadoop-bundle-1.6.0.jar > > and > > -rw-r--r-- 1 jacek staff 46983 17 wrz 09:51 > jackson-annotations-2.6.5.jar > -rw-r--r-- 1 jacek staff258876 17 wrz 09:51 jackson-core-2.6.5.jar > -rw-r--r-- 1 jacek staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar > -rw-r--r-- 1 jacek staff 1171380 17 wrz 09:51 jackson-databind-2.6.5.jar > -rw-r--r-- 1 jacek staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar > -rw-r--r-- 1 jacek staff780664 17 wrz 09:51 > jackson-mapper-asl-1.9.13.jar > -rw-r--r-- 1 jacek staff 41263 17 wrz 09:51 > jackson-module-paranamer-2.6.5.jar > -rw-r--r-- 1 jacek staff515604 17 wrz 09:51 > jackson-module-scala_2.11-2.6.5.jar > -rw-r--r-- 1 jacek staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar > > and > > -rw-r--r-- 1 jacek staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar > -rw-r--r-- 1 jacek staff206035 17 wrz 09:51 > commons-beanutils-core-1.8.0.jar > > and > > -rw-r--r-- 1 jacek staff445288 17 wrz 09:51 antlr-2.7.7.jar > -rw-r--r-- 1 jacek staff164368 17 wrz 09:51 antlr-runtime-3.4.jar > -rw-r--r-- 1 jacek staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar > > Even if that does not cause any class mismatches, it might be worth to > exclude them to minimize the size of the Spark distro. > > What do you think? > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Different versions of dependencies in assembly/target/scala-2.11/jars?
Hi Sean, Thanks a lot for help understanding the different jars. Do you think there's anything that should be reported as an enhancement/issue/task in JIRA? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Sep 17, 2016 at 11:34 PM, Sean Owen wrote: > No, these are different major versions of these components, each of > which gets used by something in the transitive dependency graph. They > are not redundant because they're not actually presenting roughly the > same component in the same namespace. > > However the parquet-hadoop bit looks wrong, in that it should be > harmonized to one 1.x version. It's not that Spark uses inconsistent > versions but that transitive deps do. We can still harmonize them in > the build if it causes problems. > > On Sat, Sep 17, 2016 at 8:14 PM, Jacek Laskowski wrote: >> Hi, >> >> Just noticed in assembly/target/scala-2.11/jars that similar libraries >> have different versions: >> >> -rw-r--r-- 1 jacek staff 1230201 17 wrz 09:51 netty-3.8.0.Final.jar >> -rw-r--r-- 1 jacek staff 2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar >> >> and >> >> -rw-r--r-- 1 jacek staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar >> -rw-r--r-- 1 jacek staff 2796935 17 wrz 09:51 >> parquet-hadoop-bundle-1.6.0.jar >> >> and >> >> -rw-r--r-- 1 jacek staff 46983 17 wrz 09:51 >> jackson-annotations-2.6.5.jar >> -rw-r--r-- 1 jacek staff258876 17 wrz 09:51 jackson-core-2.6.5.jar >> -rw-r--r-- 1 jacek staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar >> -rw-r--r-- 1 jacek staff 1171380 17 wrz 09:51 jackson-databind-2.6.5.jar >> -rw-r--r-- 1 jacek staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar >> -rw-r--r-- 1 jacek staff780664 17 wrz 09:51 >> jackson-mapper-asl-1.9.13.jar >> -rw-r--r-- 1 jacek staff 41263 17 wrz 09:51 >> jackson-module-paranamer-2.6.5.jar >> -rw-r--r-- 1 jacek staff515604 17 wrz 09:51 >> jackson-module-scala_2.11-2.6.5.jar >> -rw-r--r-- 1 jacek staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar >> >> and >> >> -rw-r--r-- 1 jacek staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar >> -rw-r--r-- 1 jacek staff206035 17 wrz 09:51 >> commons-beanutils-core-1.8.0.jar >> >> and >> >> -rw-r--r-- 1 jacek staff445288 17 wrz 09:51 antlr-2.7.7.jar >> -rw-r--r-- 1 jacek staff164368 17 wrz 09:51 antlr-runtime-3.4.jar >> -rw-r--r-- 1 jacek staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar >> >> Even if that does not cause any class mismatches, it might be worth to >> exclude them to minimize the size of the Spark distro. >> >> What do you think? >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: java.lang.NoClassDefFoundError, is this a bug?
Hi, I'm surprised too. Here's the entire stack trace for reference. I'd also like to know what causes the issue. Caused by: java.lang.NoClassDefFoundError: Could not initialize class Main$ at Main$$anonfun$main$1.apply(test.scala:14) at Main$$anonfun$main$1.apply(test.scala:14) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:277) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Sep 17, 2016 at 11:08 PM, Xiang Gao wrote: > Hi, > > In my application, I got a weird error message: > java.lang.NoClassDefFoundError: Could not initialize class X > > This happens only when I try to submit my application in cluster mode. It > works perfectly in client mode. > > I'm able to reproduce this error message by a simple 16-line program: > https://github.com/zasdfgbnm/spark-test1/blob/master/src/main/scala/test.scala > > To reproduce it, simply clone this git repo, and then execute command like: > sbt package && spark-submit --master spark://localhost:7077 > target/scala-2.11/createdataset_2.11-0.0.1-SNAPSHOT.jar > > Can anyone check whether this is a bug of spark? > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: java.lang.NoClassDefFoundError, is this a bug?
Besides, if you replace line #14 with: Env.spark.createDataset(Seq("a","b","c")).rdd.map(func).collect() You will have the same problem with a different stack trace: Caused by: java.lang.NoClassDefFoundError: Could not initialize class Main$ at Main$$anonfun$main$1.apply(test.scala:14) at Main$$anonfun$main$1.apply(test.scala:14) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1918) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1918) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18976.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Fwd: Question regarding merging to two RDDs
maybe you can use dataframe ,with the header file as a schema -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Question-regarding-merging-to-two-RDDs-tp18971p18977.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: java.lang.NoClassDefFoundError, is this a bug?
do you run this on yarn mode or else? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18978.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: java.lang.NoClassDefFoundError, is this a bug?
spark standalone cluster -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18979.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: java.lang.NoClassDefFoundError, is this a bug?
if I remove this abstract class A[T : Encoder] {} it's ok! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18980.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: java.lang.NoClassDefFoundError, is this a bug?
Yes. Besides, if you change the "T : Encoder" to "T", it OK too. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18981.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org