Creating AMIs from scratch is a complete pain in the ass. If you have a spare 
week, sure. I understand why the team avoids it.

The easiest way is probably to spin up a working instance and then use Amazons 
"save as new AMI", but that has some major limitations, especially with 
software not expecting it. ("There are two of me now!") Worker nodes might cope 
better than the master.

But yes, I also would love new AMIs that don't pull down 200 meg every time I 
spin up. 
("Spin up a cluster in five minutes" HA!) Also, AMIs per region are also good 
for costs. I've thought of doing up new ones, (since I have experience) but I 
have no time and other issues first. Perhaps once I know Spark better.

At least with spark, we have more control over the scripts exactly because they 
are "Primitive". I had a quick look at YARN/Ambari, and it wasn't obvious they 
were any better with EC2, and a hundred times the complexity.

I expect most AWS-heavy companies have a full time person just managing AMIs. 
They are that annoying. It's what makes Cloudera attractive.

Jeremy Lee   BCompSci (Hons)
The Unorthodox Engineers

> On 6 Jun 2014, at 6:44 am, Matt Work Coarr <mattcoarr.w...@gmail.com> wrote:
> 
> How would I go about creating a new AMI image that I can use with the spark 
> ec2 commands? I can't seem to find any documentation.  I'm looking for a list 
> of steps that I'd need to perform to make an Amazon Linux image ready to be 
> used by the spark ec2 tools.
> 
> I've been reading through the spark 1.0.0 documentation, looking at the 
> script itself (spark_ec2.py), and looking at the github project 
> mesos/spark-ec2.
> 
> From what I can tell, the spark_ec2.py script looks up the id of the AMI 
> based on the region and machine type (hvm or pvm) using static content 
> derived from the github repo mesos/spark-ec2.
> 
> The spark ec2 script loads the AMI id from this base url:
> https://raw.github.com/mesos/spark-ec2/v2/ami-list
> (Which presumably comes from https://github.com/mesos/spark-ec2 )
> 
> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
> ami-5bb18832
> 
> Is there a list of instructions for how this AMI was created?  Assuming I'm 
> starting with my own Amazon Linux image, what would I need to do to make it 
> usable where I could pass that AMI id to spark_ec2.py rather than using the 
> default spark-provided AMI?
> 
> Thanks,
> Matt

Reply via email to