Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Jacek Laskowski
Hi,

Just noticed in assembly/target/scala-2.11/jars that similar libraries
have different versions:

-rw-r--r--  1 jacek  staff   1230201 17 wrz 09:51 netty-3.8.0.Final.jar
-rw-r--r--  1 jacek  staff   2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar

and

-rw-r--r--  1 jacek  staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar
-rw-r--r--  1 jacek  staff   2796935 17 wrz 09:51
parquet-hadoop-bundle-1.6.0.jar

and

-rw-r--r--  1 jacek  staff 46983 17 wrz 09:51 jackson-annotations-2.6.5.jar
-rw-r--r--  1 jacek  staff258876 17 wrz 09:51 jackson-core-2.6.5.jar
-rw-r--r--  1 jacek  staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar
-rw-r--r--  1 jacek  staff   1171380 17 wrz 09:51 jackson-databind-2.6.5.jar
-rw-r--r--  1 jacek  staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar
-rw-r--r--  1 jacek  staff780664 17 wrz 09:51 jackson-mapper-asl-1.9.13.jar
-rw-r--r--  1 jacek  staff 41263 17 wrz 09:51
jackson-module-paranamer-2.6.5.jar
-rw-r--r--  1 jacek  staff515604 17 wrz 09:51
jackson-module-scala_2.11-2.6.5.jar
-rw-r--r--  1 jacek  staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar

and

-rw-r--r--  1 jacek  staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar
-rw-r--r--  1 jacek  staff206035 17 wrz 09:51
commons-beanutils-core-1.8.0.jar

and

-rw-r--r--  1 jacek  staff445288 17 wrz 09:51 antlr-2.7.7.jar
-rw-r--r--  1 jacek  staff164368 17 wrz 09:51 antlr-runtime-3.4.jar
-rw-r--r--  1 jacek  staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar

Even if that does not cause any class mismatches, it might be worth to
exclude them to minimize the size of the Spark distro.

What do you think?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Fwd: Question regarding merging to two RDDs

2016-09-17 Thread Hiral Mehta
Hi,

I have two separate csv files one with header and other with data. I read
those two files in 2 different RDDs and now I need to merge both the RDDs.

I tried various options such as union, zip, join but none worked for my
problem.
What is the best way to merge two RDDs so that the header and data are
merged into new RDD with header and data?

Thanks,
Hiral Mehta


java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao
Hi,

In my application, I got a weird error message:
java.lang.NoClassDefFoundError: Could not initialize class X

This happens only when I try to submit my application in cluster mode. It
works perfectly in client mode.

I'm able to reproduce this error message by a simple 16-line program:
https://github.com/zasdfgbnm/spark-test1/blob/master/src/main/scala/test.scala

To reproduce it, simply clone this git repo, and then execute command like:
sbt package && spark-submit --master spark://localhost:7077
target/scala-2.11/createdataset_2.11-0.0.1-SNAPSHOT.jar

Can anyone check whether this is a bug of spark?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Sean Owen
No, these are different major versions of these components, each of
which gets used by something in the transitive dependency graph. They
are not redundant because they're not actually presenting roughly the
same component in the same namespace.

However the parquet-hadoop bit looks wrong, in that it should be
harmonized to one 1.x version. It's not that Spark uses inconsistent
versions but that transitive deps do. We can still harmonize them in
the build if it causes problems.

On Sat, Sep 17, 2016 at 8:14 PM, Jacek Laskowski  wrote:
> Hi,
>
> Just noticed in assembly/target/scala-2.11/jars that similar libraries
> have different versions:
>
> -rw-r--r--  1 jacek  staff   1230201 17 wrz 09:51 netty-3.8.0.Final.jar
> -rw-r--r--  1 jacek  staff   2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar
>
> and
>
> -rw-r--r--  1 jacek  staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar
> -rw-r--r--  1 jacek  staff   2796935 17 wrz 09:51
> parquet-hadoop-bundle-1.6.0.jar
>
> and
>
> -rw-r--r--  1 jacek  staff 46983 17 wrz 09:51 
> jackson-annotations-2.6.5.jar
> -rw-r--r--  1 jacek  staff258876 17 wrz 09:51 jackson-core-2.6.5.jar
> -rw-r--r--  1 jacek  staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar
> -rw-r--r--  1 jacek  staff   1171380 17 wrz 09:51 jackson-databind-2.6.5.jar
> -rw-r--r--  1 jacek  staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar
> -rw-r--r--  1 jacek  staff780664 17 wrz 09:51 
> jackson-mapper-asl-1.9.13.jar
> -rw-r--r--  1 jacek  staff 41263 17 wrz 09:51
> jackson-module-paranamer-2.6.5.jar
> -rw-r--r--  1 jacek  staff515604 17 wrz 09:51
> jackson-module-scala_2.11-2.6.5.jar
> -rw-r--r--  1 jacek  staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar
>
> and
>
> -rw-r--r--  1 jacek  staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar
> -rw-r--r--  1 jacek  staff206035 17 wrz 09:51
> commons-beanutils-core-1.8.0.jar
>
> and
>
> -rw-r--r--  1 jacek  staff445288 17 wrz 09:51 antlr-2.7.7.jar
> -rw-r--r--  1 jacek  staff164368 17 wrz 09:51 antlr-runtime-3.4.jar
> -rw-r--r--  1 jacek  staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar
>
> Even if that does not cause any class mismatches, it might be worth to
> exclude them to minimize the size of the Spark distro.
>
> What do you think?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Jacek Laskowski
Hi Sean,

Thanks a lot for help understanding the different jars.

Do you think there's anything that should be reported as an
enhancement/issue/task in JIRA?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Sep 17, 2016 at 11:34 PM, Sean Owen  wrote:
> No, these are different major versions of these components, each of
> which gets used by something in the transitive dependency graph. They
> are not redundant because they're not actually presenting roughly the
> same component in the same namespace.
>
> However the parquet-hadoop bit looks wrong, in that it should be
> harmonized to one 1.x version. It's not that Spark uses inconsistent
> versions but that transitive deps do. We can still harmonize them in
> the build if it causes problems.
>
> On Sat, Sep 17, 2016 at 8:14 PM, Jacek Laskowski  wrote:
>> Hi,
>>
>> Just noticed in assembly/target/scala-2.11/jars that similar libraries
>> have different versions:
>>
>> -rw-r--r--  1 jacek  staff   1230201 17 wrz 09:51 netty-3.8.0.Final.jar
>> -rw-r--r--  1 jacek  staff   2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar
>> -rw-r--r--  1 jacek  staff   2796935 17 wrz 09:51
>> parquet-hadoop-bundle-1.6.0.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff 46983 17 wrz 09:51 
>> jackson-annotations-2.6.5.jar
>> -rw-r--r--  1 jacek  staff258876 17 wrz 09:51 jackson-core-2.6.5.jar
>> -rw-r--r--  1 jacek  staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar
>> -rw-r--r--  1 jacek  staff   1171380 17 wrz 09:51 jackson-databind-2.6.5.jar
>> -rw-r--r--  1 jacek  staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar
>> -rw-r--r--  1 jacek  staff780664 17 wrz 09:51 
>> jackson-mapper-asl-1.9.13.jar
>> -rw-r--r--  1 jacek  staff 41263 17 wrz 09:51
>> jackson-module-paranamer-2.6.5.jar
>> -rw-r--r--  1 jacek  staff515604 17 wrz 09:51
>> jackson-module-scala_2.11-2.6.5.jar
>> -rw-r--r--  1 jacek  staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar
>> -rw-r--r--  1 jacek  staff206035 17 wrz 09:51
>> commons-beanutils-core-1.8.0.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff445288 17 wrz 09:51 antlr-2.7.7.jar
>> -rw-r--r--  1 jacek  staff164368 17 wrz 09:51 antlr-runtime-3.4.jar
>> -rw-r--r--  1 jacek  staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar
>>
>> Even if that does not cause any class mismatches, it might be worth to
>> exclude them to minimize the size of the Spark distro.
>>
>> What do you think?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Jacek Laskowski
Hi,

I'm surprised too. Here's the entire stack trace for reference. I'd
also like to know what causes the issue.

Caused by: java.lang.NoClassDefFoundError: Could not initialize class Main$
at Main$$anonfun$main$1.apply(test.scala:14)
at Main$$anonfun$main$1.apply(test.scala:14)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:277)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Sep 17, 2016 at 11:08 PM, Xiang Gao  wrote:
> Hi,
>
> In my application, I got a weird error message:
> java.lang.NoClassDefFoundError: Could not initialize class X
>
> This happens only when I try to submit my application in cluster mode. It
> works perfectly in client mode.
>
> I'm able to reproduce this error message by a simple 16-line program:
> https://github.com/zasdfgbnm/spark-test1/blob/master/src/main/scala/test.scala
>
> To reproduce it, simply clone this git repo, and then execute command like:
> sbt package && spark-submit --master spark://localhost:7077
> target/scala-2.11/createdataset_2.11-0.0.1-SNAPSHOT.jar
>
> Can anyone check whether this is a bug of spark?
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao
Besides, if you replace line #14 with:
Env.spark.createDataset(Seq("a","b","c")).rdd.map(func).collect()

You will have the same problem with a different stack trace:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class Main$
at Main$$anonfun$main$1.apply(test.scala:14)
at Main$$anonfun$main$1.apply(test.scala:14)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1918)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1918)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18976.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Fwd: Question regarding merging to two RDDs

2016-09-17 Thread WangJianfei
maybe you can use dataframe ,with the header file as a schema 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Question-regarding-merging-to-two-RDDs-tp18971p18977.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread WangJianfei
do you run this on yarn mode or else?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18978.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao
spark standalone cluster



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18979.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread WangJianfei
if I remove this abstract class A[T : Encoder] {}  it's ok!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18980.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao
Yes. Besides, if you change the "T : Encoder" to "T", it OK too.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18981.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org