Re: Spark runs applications in an inconsistent way

2014-04-23 Thread Aureliano Buendia
Yes, things get more unstable with larger data. But, that's the whole point of my question: Why should spark get unstable when data gets larger? When data gets larger, spark should get *slower*, not more unstable. lack of stability makes parameter tuning very difficult, time consuming and a painf

Re: Spark runs applications in an inconsistent way

2014-04-23 Thread Andras Barjak
> >- Spark UI shows number of succeeded tasks is more than total number >of tasks, eg: 3500/3000. There are no failed tasks. At this stage the >computation keeps carrying on for a long time without returning an answer. > > No sign of resubmitted tasks in the command line logs either? Yo

Re: Spark runs applications in an inconsistent way

2014-04-23 Thread Mayur Rustagi
Very abstract. EC2 is unlikely culprit. What are you trying to do. Spark is typically not inconsistent like that but huge intermediate data, reduce size issues could be involved, but hard to help without some more detail of what you are trying to achieve. Mayur Rustagi Ph: +1 (760) 203 3257 http:/

Spark runs applications in an inconsistent way

2014-04-22 Thread Aureliano Buendia
Hi, Sometimes running the very same spark application binary, behaves differently with every execution. - The Ganglia profile is different with every execution: sometimes it takes 0.5 TB of memory, the next time it takes 1 TB of memory, the next time it is 0.75 TB... - Spark UI shows