Yeah, we badly need new AMIs that include at a minimum package/security updates and Python 2.7. There is an open issue to track the 2.7 AMI update <https://issues.apache.org/jira/browse/SPARK-922>, at least.
On Thu, Jun 12, 2014 at 3:34 PM, <unorthodox.engine...@gmail.com> wrote: > Creating AMIs from scratch is a complete pain in the ass. If you have a > spare week, sure. I understand why the team avoids it. > > The easiest way is probably to spin up a working instance and then use > Amazons "save as new AMI", but that has some major limitations, especially > with software not expecting it. ("There are two of me now!") Worker nodes > might cope better than the master. > > But yes, I also would love new AMIs that don't pull down 200 meg every > time I spin up. > ("Spin up a cluster in five minutes" HA!) Also, AMIs per region are also > good for costs. I've thought of doing up new ones, (since I have > experience) but I have no time and other issues first. Perhaps once I know > Spark better. > > At least with spark, we have more control over the scripts exactly because > they are "Primitive". I had a quick look at YARN/Ambari, and it wasn't > obvious they were any better with EC2, and a hundred times the complexity. > > I expect most AWS-heavy companies have a full time person just managing > AMIs. They are that annoying. It's what makes Cloudera attractive. > > Jeremy Lee BCompSci (Hons) > The Unorthodox Engineers > > On 6 Jun 2014, at 6:44 am, Matt Work Coarr <mattcoarr.w...@gmail.com> > wrote: > > How would I go about creating a new AMI image that I can use with the > spark ec2 commands? I can't seem to find any documentation. I'm looking > for a list of steps that I'd need to perform to make an Amazon Linux image > ready to be used by the spark ec2 tools. > > I've been reading through the spark 1.0.0 documentation, looking at the > script itself (spark_ec2.py), and looking at the github project > mesos/spark-ec2. > > From what I can tell, the spark_ec2.py script looks up the id of the AMI > based on the region and machine type (hvm or pvm) using static content > derived from the github repo mesos/spark-ec2. > > The spark ec2 script loads the AMI id from this base url: > https://raw.github.com/mesos/spark-ec2/v2/ami-list > (Which presumably comes from https://github.com/mesos/spark-ec2 ) > > For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id: > ami-5bb18832 > > Is there a list of instructions for how this AMI was created? Assuming > I'm starting with my own Amazon Linux image, what would I need to do to > make it usable where I could pass that AMI id to spark_ec2.py rather than > using the default spark-provided AMI? > > Thanks, > Matt > >