difference in the amount of data that gets
>>>> shuffled/spilled (which leads to much earlier OOM-conditions), when using
>>>> spark-shell.
>>>> What could be the reason for this different behaviour using very
>>>> similar configurations and identic
>> sources) and identical spark binaries? Why would code launched from
>>> spark-shell generate more shuffled data for the same number of shuffled
>>> tuples?
>>>
>>> An analysis would be much appreciated.
>>>
>>> Best,
>>>
&g
ick
>>
>> On Wed, Aug 19, 2015 at 2:47 PM, Rick Moritz wrote:
>>
>>> oops, forgot to reply-all on this thread.
>>>
>>> -- Forwarded message --
>>> From: Rick Moritz
>>> Date: Wed, Aug 19, 2015 at 2:46 PM
>>> Subject: R
r the same number of shuffled
> tuples?
>
> An analysis would be much appreciated.
>
> Best,
>
> Rick
>
> On Wed, Aug 19, 2015 at 2:47 PM, Rick Moritz wrote:
>
>> oops, forgot to reply-all on this thread.
>>
>> ------ Forwarded message ------
>
z
> Date: Wed, Aug 19, 2015 at 2:46 PM
> Subject: Re: Strange shuffle behaviour difference between Zeppelin and
> Spark-shell
> To: Igor Berman
>
>
> Those values are not explicitely set, and attempting to read their values
> results in 'java.util.NoSuchElementException:
oops, forgot to reply-all on this thread.
-- Forwarded message --
From: Rick Moritz
Date: Wed, Aug 19, 2015 at 2:46 PM
Subject: Re: Strange shuffle behaviour difference between Zeppelin and
Spark-shell
To: Igor Berman
Those values are not explicitely set, and attempting to read
i would compare spark ui metrics for both cases and see any
differences(number of partitions, number of spills etc)
why can't you make repl to be consistent with zepellin spark version?
might be rc has issues...
On 19 August 2015 at 14:42, Rick Moritz wrote:
> No, the setup is one driver wit
No, the setup is one driver with 32g of memory, and three executors each
with 8g of memory in both cases. No core-number has been specified, thus it
should default to single-core (though I've seen the yarn-owned jvms
wrapping the executors take up to 3 cores in top). That is, unless, as I
suggested
any differences in number of cores, memory settings for executors?
On 19 August 2015 at 09:49, Rick Moritz wrote:
> Dear list,
>
> I am observing a very strange difference in behaviour between a Spark
> 1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin
> interpreter (comp
Creating a franken-jar and replacing the differing .class in my
spark-assembly with the one compiled with java 1.6 appears to make no
significant difference with regards to the generated shuffle-volume.
I will try using FAIR-scheduling from the shell after the sark-submit test,
to see if that has a
Dear list,
I am observing a very strange difference in behaviour between a Spark
1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin
interpreter (compiled with Java 6 and sourced from maven central).
The workflow loads data from Hive, applies a number of transformations
(incl
Dear list,
I am observing a very strange behaviour between a Spark 1.4.0-rc4 REPL
(locally compiled with Java 7) and a Spark 1.4.0 zeppelin interpreter
(compiled with Java 6 and sourced from maven central).
The workflow loads data from Hive, applies a number of transformations
(including quite a
12 matches
Mail list logo