Yeah, we badly need new AMIs that include at a minimum package/security
updates and Python 2.7. There is an open issue to track the 2.7 AMI update
<https://issues.apache.org/jira/browse/SPARK-922>, at least.


On Thu, Jun 12, 2014 at 3:34 PM, <unorthodox.engine...@gmail.com> wrote:

> Creating AMIs from scratch is a complete pain in the ass. If you have a
> spare week, sure. I understand why the team avoids it.
>
> The easiest way is probably to spin up a working instance and then use
> Amazons "save as new AMI", but that has some major limitations, especially
> with software not expecting it. ("There are two of me now!") Worker nodes
> might cope better than the master.
>
> But yes, I also would love new AMIs that don't pull down 200 meg every
> time I spin up.
> ("Spin up a cluster in five minutes" HA!) Also, AMIs per region are also
> good for costs. I've thought of doing up new ones, (since I have
> experience) but I have no time and other issues first. Perhaps once I know
> Spark better.
>
> At least with spark, we have more control over the scripts exactly because
> they are "Primitive". I had a quick look at YARN/Ambari, and it wasn't
> obvious they were any better with EC2, and a hundred times the complexity.
>
> I expect most AWS-heavy companies have a full time person just managing
> AMIs. They are that annoying. It's what makes Cloudera attractive.
>
> Jeremy Lee   BCompSci (Hons)
> The Unorthodox Engineers
>
> On 6 Jun 2014, at 6:44 am, Matt Work Coarr <mattcoarr.w...@gmail.com>
> wrote:
>
> How would I go about creating a new AMI image that I can use with the
> spark ec2 commands? I can't seem to find any documentation.  I'm looking
> for a list of steps that I'd need to perform to make an Amazon Linux image
> ready to be used by the spark ec2 tools.
>
> I've been reading through the spark 1.0.0 documentation, looking at the
> script itself (spark_ec2.py), and looking at the github project
> mesos/spark-ec2.
>
> From what I can tell, the spark_ec2.py script looks up the id of the AMI
> based on the region and machine type (hvm or pvm) using static content
> derived from the github repo mesos/spark-ec2.
>
> The spark ec2 script loads the AMI id from this base url:
> https://raw.github.com/mesos/spark-ec2/v2/ami-list
> (Which presumably comes from https://github.com/mesos/spark-ec2 )
>
> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
> ami-5bb18832
>
> Is there a list of instructions for how this AMI was created?  Assuming
> I'm starting with my own Amazon Linux image, what would I need to do to
> make it usable where I could pass that AMI id to spark_ec2.py rather than
> using the default spark-provided AMI?
>
> Thanks,
> Matt
>
>

Reply via email to