Yes, things get more unstable with larger data. But, that's the whole point
of my question:
Why should spark get unstable when data gets larger?
When data gets larger, spark should get *slower*, not more unstable. lack
of stability makes parameter tuning very difficult, time consuming and a
painf
>
>- Spark UI shows number of succeeded tasks is more than total number
>of tasks, eg: 3500/3000. There are no failed tasks. At this stage the
>computation keeps carrying on for a long time without returning an answer.
>
> No sign of resubmitted tasks in the command line logs either?
Yo
Very abstract.
EC2 is unlikely culprit.
What are you trying to do. Spark is typically not inconsistent like that
but huge intermediate data, reduce size issues could be involved, but hard
to help without some more detail of what you are trying to achieve.
Mayur Rustagi
Ph: +1 (760) 203 3257
http:/
Hi,
Sometimes running the very same spark application binary, behaves
differently with every execution.
- The Ganglia profile is different with every execution: sometimes it
takes 0.5 TB of memory, the next time it takes 1 TB of memory, the next
time it is 0.75 TB...
- Spark UI shows