Hi Patrick,
Thank you very much for your response. I am almost there, but am not sure
about my conclusion. Let me try to approach it from a different angle.
I would like to time the impact of a particular lambda function, or if
possible, more broadly measure the the impact of any map function. I
I am trying to understand what the data and computation flow is in Spark, and
believe I fairly understand the Shuffle (both map and reduce side), but I do
not get what happens to the computation from the map stages. I know all maps
gets pipelined on the shuffle (when there is no other action in bet
Hi reynold,
It took me some time, but I've finally found that there is a difference
between spilling on the map-side and spilling on the reduce-side for a
shuffle. Spilling to disk on the map-side happens by default (with the
spillToPartitionFiles call from insertAll in ExternalSorter; don't know
apply? How much memory does the web ui say is available?
>
> BTW - I don't think any JVM can actually handle 700G heap ... (maybe Zing).
>
> On Thu, Mar 12, 2015 at 4:09 PM, Tom Hubregtsen
> wrote:
>
>> Hi all,
>>
>> I'm running the teraSort benchmark
Hi all,
I'm running the teraSort benchmark with a relative small input set: 5GB.
During profiling, I can see I am using a total of 68GB. I've got a terabyte
of memory in my system, and set
spark.executor.memory 900g
spark.driver.memory 900g
I use the default for
spark.shuffle.memoryFraction
spar
Hi all,
I would like to validate my understanding of memory regions in Spark. Any
comments on my description below would be appreciated!
Execution is split up into stages, based on wide dependencies between RDDs
and actions such as save. All transformations involving narrow dependencies
before th
Hi,
I ran the same version of a program with two different types of input
containing equivalent information.
Program 1: 10,000 files with on average 50 IDs, one every line
Program 2: 1 file containing 10,000 lines. On average 50 IDs per line
My program takes the input, creates key/value pairs of
Use unpersist(), even when not persisted before.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/memory-size-for-caching-RDD-tp8256p8579.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
-
Also, if I am not mistaken, this data is automatically removed after your
run. Be sure to check it while running your program.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp8529p8578.html
Sent from the
As I've told before, I am currently writing my master's thesis on storage and
memory usage in Spark. I am currently specifically looking at the different
fractions of memory:
I was able to find 3 memory regions, but it seems to leave some unaccounted
for:
1. spark.shuffle.memoryFraction: 20%
2. sp
Hi all,
Just one line of context, since last post mentioned this would help:
I'm currently writing my masters thesis (Computer Engineering) on storage
and memory in both Spark and Hadoop.
Right now I'm trying to analyze the spilling behavior of Spark, and I do not
see what I expect. Therefor, I w
11 matches
Mail list logo