date:20160917

Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Jacek Laskowski

Hi,

Just noticed in assembly/target/scala-2.11/jars that similar libraries
have different versions:

-rw-r--r--  1 jacek  staff   1230201 17 wrz 09:51 netty-3.8.0.Final.jar
-rw-r--r--  1 jacek  staff   2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar

and

-rw-r--r--  1 jacek  staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar
-rw-r--r--  1 jacek  staff   2796935 17 wrz 09:51
parquet-hadoop-bundle-1.6.0.jar

and

-rw-r--r--  1 jacek  staff 46983 17 wrz 09:51 jackson-annotations-2.6.5.jar
-rw-r--r--  1 jacek  staff258876 17 wrz 09:51 jackson-core-2.6.5.jar
-rw-r--r--  1 jacek  staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar
-rw-r--r--  1 jacek  staff   1171380 17 wrz 09:51 jackson-databind-2.6.5.jar
-rw-r--r--  1 jacek  staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar
-rw-r--r--  1 jacek  staff780664 17 wrz 09:51 jackson-mapper-asl-1.9.13.jar
-rw-r--r--  1 jacek  staff 41263 17 wrz 09:51
jackson-module-paranamer-2.6.5.jar
-rw-r--r--  1 jacek  staff515604 17 wrz 09:51
jackson-module-scala_2.11-2.6.5.jar
-rw-r--r--  1 jacek  staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar

and

-rw-r--r--  1 jacek  staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar
-rw-r--r--  1 jacek  staff206035 17 wrz 09:51
commons-beanutils-core-1.8.0.jar

and

-rw-r--r--  1 jacek  staff445288 17 wrz 09:51 antlr-2.7.7.jar
-rw-r--r--  1 jacek  staff164368 17 wrz 09:51 antlr-runtime-3.4.jar
-rw-r--r--  1 jacek  staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar

Even if that does not cause any class mismatches, it might be worth to
exclude them to minimize the size of the Spark distro.

What do you think?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Fwd: Question regarding merging to two RDDs

2016-09-17 Thread Hiral Mehta

Hi,

I have two separate csv files one with header and other with data. I read
those two files in 2 different RDDs and now I need to merge both the RDDs.

I tried various options such as union, zip, join but none worked for my
problem.
What is the best way to merge two RDDs so that the header and data are
merged into new RDD with header and data?

Thanks,
Hiral Mehta

java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao

Hi,

In my application, I got a weird error message:
java.lang.NoClassDefFoundError: Could not initialize class X

This happens only when I try to submit my application in cluster mode. It
works perfectly in client mode.

I'm able to reproduce this error message by a simple 16-line program:
https://github.com/zasdfgbnm/spark-test1/blob/master/src/main/scala/test.scala

To reproduce it, simply clone this git repo, and then execute command like:
sbt package && spark-submit --master spark://localhost:7077
target/scala-2.11/createdataset_2.11-0.0.1-SNAPSHOT.jar

Can anyone check whether this is a bug of spark?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Sean Owen

No, these are different major versions of these components, each of
which gets used by something in the transitive dependency graph. They
are not redundant because they're not actually presenting roughly the
same component in the same namespace.

However the parquet-hadoop bit looks wrong, in that it should be
harmonized to one 1.x version. It's not that Spark uses inconsistent
versions but that transitive deps do. We can still harmonize them in
the build if it causes problems.

On Sat, Sep 17, 2016 at 8:14 PM, Jacek Laskowski  wrote:
> Hi,
>
> Just noticed in assembly/target/scala-2.11/jars that similar libraries
> have different versions:
>
> -rw-r--r--  1 jacek  staff   1230201 17 wrz 09:51 netty-3.8.0.Final.jar
> -rw-r--r--  1 jacek  staff   2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar
>
> and
>
> -rw-r--r--  1 jacek  staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar
> -rw-r--r--  1 jacek  staff   2796935 17 wrz 09:51
> parquet-hadoop-bundle-1.6.0.jar
>
> and
>
> -rw-r--r--  1 jacek  staff 46983 17 wrz 09:51 
> jackson-annotations-2.6.5.jar
> -rw-r--r--  1 jacek  staff258876 17 wrz 09:51 jackson-core-2.6.5.jar
> -rw-r--r--  1 jacek  staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar
> -rw-r--r--  1 jacek  staff   1171380 17 wrz 09:51 jackson-databind-2.6.5.jar
> -rw-r--r--  1 jacek  staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar
> -rw-r--r--  1 jacek  staff780664 17 wrz 09:51 
> jackson-mapper-asl-1.9.13.jar
> -rw-r--r--  1 jacek  staff 41263 17 wrz 09:51
> jackson-module-paranamer-2.6.5.jar
> -rw-r--r--  1 jacek  staff515604 17 wrz 09:51
> jackson-module-scala_2.11-2.6.5.jar
> -rw-r--r--  1 jacek  staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar
>
> and
>
> -rw-r--r--  1 jacek  staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar
> -rw-r--r--  1 jacek  staff206035 17 wrz 09:51
> commons-beanutils-core-1.8.0.jar
>
> and
>
> -rw-r--r--  1 jacek  staff445288 17 wrz 09:51 antlr-2.7.7.jar
> -rw-r--r--  1 jacek  staff164368 17 wrz 09:51 antlr-runtime-3.4.jar
> -rw-r--r--  1 jacek  staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar
>
> Even if that does not cause any class mismatches, it might be worth to
> exclude them to minimize the size of the Spark distro.
>
> What do you think?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Jacek Laskowski

Hi Sean,

Thanks a lot for help understanding the different jars.

Do you think there's anything that should be reported as an
enhancement/issue/task in JIRA?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Sep 17, 2016 at 11:34 PM, Sean Owen  wrote:
> No, these are different major versions of these components, each of
> which gets used by something in the transitive dependency graph. They
> are not redundant because they're not actually presenting roughly the
> same component in the same namespace.
>
> However the parquet-hadoop bit looks wrong, in that it should be
> harmonized to one 1.x version. It's not that Spark uses inconsistent
> versions but that transitive deps do. We can still harmonize them in
> the build if it causes problems.
>
> On Sat, Sep 17, 2016 at 8:14 PM, Jacek Laskowski  wrote:
>> Hi,
>>
>> Just noticed in assembly/target/scala-2.11/jars that similar libraries
>> have different versions:
>>
>> -rw-r--r--  1 jacek  staff   1230201 17 wrz 09:51 netty-3.8.0.Final.jar
>> -rw-r--r--  1 jacek  staff   2305335 17 wrz 09:51 netty-all-4.0.41.Final.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff218076 17 wrz 09:51 parquet-hadoop-1.8.1.jar
>> -rw-r--r--  1 jacek  staff   2796935 17 wrz 09:51
>> parquet-hadoop-bundle-1.6.0.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff 46983 17 wrz 09:51 
>> jackson-annotations-2.6.5.jar
>> -rw-r--r--  1 jacek  staff258876 17 wrz 09:51 jackson-core-2.6.5.jar
>> -rw-r--r--  1 jacek  staff232248 17 wrz 09:51 jackson-core-asl-1.9.13.jar
>> -rw-r--r--  1 jacek  staff   1171380 17 wrz 09:51 jackson-databind-2.6.5.jar
>> -rw-r--r--  1 jacek  staff 18336 17 wrz 09:51 jackson-jaxrs-1.9.13.jar
>> -rw-r--r--  1 jacek  staff780664 17 wrz 09:51 
>> jackson-mapper-asl-1.9.13.jar
>> -rw-r--r--  1 jacek  staff 41263 17 wrz 09:51
>> jackson-module-paranamer-2.6.5.jar
>> -rw-r--r--  1 jacek  staff515604 17 wrz 09:51
>> jackson-module-scala_2.11-2.6.5.jar
>> -rw-r--r--  1 jacek  staff 27084 17 wrz 09:51 jackson-xc-1.9.13.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff188671 17 wrz 09:51 commons-beanutils-1.7.0.jar
>> -rw-r--r--  1 jacek  staff206035 17 wrz 09:51
>> commons-beanutils-core-1.8.0.jar
>>
>> and
>>
>> -rw-r--r--  1 jacek  staff445288 17 wrz 09:51 antlr-2.7.7.jar
>> -rw-r--r--  1 jacek  staff164368 17 wrz 09:51 antlr-runtime-3.4.jar
>> -rw-r--r--  1 jacek  staff302248 17 wrz 09:51 antlr4-runtime-4.5.3.jar
>>
>> Even if that does not cause any class mismatches, it might be worth to
>> exclude them to minimize the size of the Spark distro.
>>
>> What do you think?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Jacek Laskowski

Hi,

I'm surprised too. Here's the entire stack trace for reference. I'd
also like to know what causes the issue.

Caused by: java.lang.NoClassDefFoundError: Could not initialize class Main$
at Main$$anonfun$main$1.apply(test.scala:14)
at Main$$anonfun$main$1.apply(test.scala:14)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:277)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Sep 17, 2016 at 11:08 PM, Xiang Gao  wrote:
> Hi,
>
> In my application, I got a weird error message:
> java.lang.NoClassDefFoundError: Could not initialize class X
>
> This happens only when I try to submit my application in cluster mode. It
> works perfectly in client mode.
>
> I'm able to reproduce this error message by a simple 16-line program:
> https://github.com/zasdfgbnm/spark-test1/blob/master/src/main/scala/test.scala
>
> To reproduce it, simply clone this git repo, and then execute command like:
> sbt package && spark-submit --master spark://localhost:7077
> target/scala-2.11/createdataset_2.11-0.0.1-SNAPSHOT.jar
>
> Can anyone check whether this is a bug of spark?
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao

Besides, if you replace line #14 with:
Env.spark.createDataset(Seq("a","b","c")).rdd.map(func).collect()

You will have the same problem with a different stack trace:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class Main$
at Main$$anonfun$main$1.apply(test.scala:14)
at Main$$anonfun$main$1.apply(test.scala:14)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1918)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1918)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18976.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Fwd: Question regarding merging to two RDDs

2016-09-17 Thread WangJianfei

maybe you can use dataframe ,with the header file as a schema 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Question-regarding-merging-to-two-RDDs-tp18971p18977.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread WangJianfei

do you run this on yarn mode or else?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18978.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao

spark standalone cluster



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18979.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread WangJianfei

if I remove this abstract class A[T : Encoder] {}  it's ok!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18980.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Xiang Gao

Yes. Besides, if you change the "T : Encoder" to "T", it OK too.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18981.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Different versions of dependencies in assembly/target/scala-2.11/jars?

Fwd: Question regarding merging to two RDDs

java.lang.NoClassDefFoundError, is this a bug?

Re: Different versions of dependencies in assembly/target/scala-2.11/jars?

Re: Different versions of dependencies in assembly/target/scala-2.11/jars?

Re: java.lang.NoClassDefFoundError, is this a bug?

Re: java.lang.NoClassDefFoundError, is this a bug?

Re: Fwd: Question regarding merging to two RDDs

Re: java.lang.NoClassDefFoundError, is this a bug?

Re: java.lang.NoClassDefFoundError, is this a bug?

Re: java.lang.NoClassDefFoundError, is this a bug?

Re: java.lang.NoClassDefFoundError, is this a bug?

12 matches

Site Navigation

Mail list logo

Footer information