from:"cheez"

AmpLab Big Data Benchmark for Spark error on EC2

2016-02-11 Thread cheez

I am trying to run the Big Data benchmark on my EC2 cluster for my own Spark fork of version 1.5. It just modifies some files on the Spark core. My cluster contains 1 master and 2 slave nodes of type m1.large. I use the ec2 scripts bundled with Spark t

Get bucket details created in shuffle phase

2015-08-07 Thread cheez

Hey all. I was trying to understand Spark Internals by looking in to (and hacking) the code. I was trying to explore the buckets which are generated when we partition the output of each map task and then let the reduce side fetch them on the basis of paritionId. I went into the write() method of