Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Ilya Ganelin
ven sending info about every map's output size > to each reducer was a problem, so Reynold has a patch that avoids that if > the number of tasks is large. > > Matei > > On Oct 10, 2014, at 10:09 PM, Ilya Ganelin wrote: > > > Hi Matei - I read your post with great inte

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Ilya Ganelin
Hi Matei - I read your post with great interest. Could you possibly comment in more depth on some of the issues you guys saw when scaling up spark and how you resolved them? I am interested specifically in spark-related problems. I'm working on scaling up spark to very large datasets and have been