Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-29 Thread Rick Moritz
difference in the amount of data that gets >>>> shuffled/spilled (which leads to much earlier OOM-conditions), when using >>>> spark-shell. >>>> What could be the reason for this different behaviour using very >>>> similar configurations and identic

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Kartik Mathur
>> sources) and identical spark binaries? Why would code launched from >>> spark-shell generate more shuffled data for the same number of shuffled >>> tuples? >>> >>> An analysis would be much appreciated. >>> >>> Best, >>> &g

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Rick Moritz
ick >> >> On Wed, Aug 19, 2015 at 2:47 PM, Rick Moritz wrote: >> >>> oops, forgot to reply-all on this thread. >>> >>> -- Forwarded message -- >>> From: Rick Moritz >>> Date: Wed, Aug 19, 2015 at 2:46 PM >>> Subject: R

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Kartik Mathur
r the same number of shuffled > tuples? > > An analysis would be much appreciated. > > Best, > > Rick > > On Wed, Aug 19, 2015 at 2:47 PM, Rick Moritz wrote: > >> oops, forgot to reply-all on this thread. >> >> ------ Forwarded message ------ >

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-27 Thread Rick Moritz
z > Date: Wed, Aug 19, 2015 at 2:46 PM > Subject: Re: Strange shuffle behaviour difference between Zeppelin and > Spark-shell > To: Igor Berman > > > Those values are not explicitely set, and attempting to read their values > results in 'java.util.NoSuchElementException:

Fwd: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Rick Moritz
oops, forgot to reply-all on this thread. -- Forwarded message -- From: Rick Moritz Date: Wed, Aug 19, 2015 at 2:46 PM Subject: Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell To: Igor Berman Those values are not explicitely set, and attempting to read

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Igor Berman
i would compare spark ui metrics for both cases and see any differences(number of partitions, number of spills etc) why can't you make repl to be consistent with zepellin spark version? might be rc has issues... On 19 August 2015 at 14:42, Rick Moritz wrote: > No, the setup is one driver wit

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Rick Moritz
No, the setup is one driver with 32g of memory, and three executors each with 8g of memory in both cases. No core-number has been specified, thus it should default to single-core (though I've seen the yarn-owned jvms wrapping the executors take up to 3 cores in top). That is, unless, as I suggested

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Igor Berman
any differences in number of cores, memory settings for executors? On 19 August 2015 at 09:49, Rick Moritz wrote: > Dear list, > > I am observing a very strange difference in behaviour between a Spark > 1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin > interpreter (comp

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Rick Moritz
Creating a franken-jar and replacing the differing .class in my spark-assembly with the one compiled with java 1.6 appears to make no significant difference with regards to the generated shuffle-volume. I will try using FAIR-scheduling from the shell after the sark-submit test, to see if that has a

Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-18 Thread Rick Moritz
Dear list, I am observing a very strange difference in behaviour between a Spark 1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin interpreter (compiled with Java 6 and sourced from maven central). The workflow loads data from Hive, applies a number of transformations (incl

Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-18 Thread Rick Moritz
Dear list, I am observing a very strange behaviour between a Spark 1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin interpreter (compiled with Java 6 and sourced from maven central). The workflow loads data from Hive, applies a number of transformations (including quite a