I'm very new to apache spark. I'm just a user not a developer. I'm running a cluster with many spot instances. Am I correct in understanding that spark can handle an unlimited number of spot instance failures and restarts? Sometimes all the spot instances will dissapear without warning, and then they come back. Can I trust spark to pickup all jobs where it left off?
I'm noticing some instability with my system. I'm suspecting it could be disk or RAM issues. When I add a lot of slaves I run low on RAM on my master. Maybe that's part of the problem. But jut want to confirm my understanding.