I've posted <https://issues.apache.org/jira/browse/SPARK-3821?focusedCommentId=14203280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14203280> an initial proposal and implementation of using Packer to automate generating Spark AMIs to SPARK-3821 <https://issues.apache.org/jira/browse/SPARK-3821>.
On Mon, Oct 6, 2014 at 7:40 PM, David Rowe <davidr...@gmail.com> wrote: > I agree with this - there is also the issue of different sized masters and > slaves, and numbers of executors for hefty machines (e.g. r3.8xlarges), > tagging of instances and volumes (we use this for cost attribution at my > workplace), and running in VPCs. > > I think think it might be useful to take a layered approach: the first > step could be getting a good reliable image produced - Nick's ticket - then > doing some work on the launch script. > > Regarding the EMR like service - I think I heard that AWS is planning to > add spark support to EMR, but as usual there's nothing firm until it's > released. > > > On Tue, Oct 7, 2014 at 7:48 AM, Daniil Osipov <daniil.osi...@shazam.com> > wrote: > >> I've also been looking at this. Basically, the Spark EC2 script is >> excellent for small development clusters of several nodes, but isn't >> suitable for production. It handles instance setup in a single threaded >> manner, while it can easily be parallelized. It also doesn't handle >> failure >> well, ex when an instance fails to start or is taking too long to respond. >> >> Our desire was to have an equivalent to Amazon EMR[1] API that would >> trigger Spark jobs, including specified cluster setup. I've done some work >> towards that end, and it would benefit from an updated AMI greatly. >> >> Dan >> >> [1] >> >> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html >> >> On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas < >> nicholas.cham...@gmail.com >> > wrote: >> >> > Thanks for posting that script, Patrick. It looks like a good place to >> > start. >> > >> > Regarding Docker vs. Packer, as I understand it you can use Packer to >> > create Docker containers at the same time as AMIs and other image types. >> > >> > Nick >> > >> > >> > On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pwend...@gmail.com> >> > wrote: >> > >> > > Hey All, >> > > >> > > Just a couple notes. I recently posted a shell script for creating the >> > > AMI's from a clean Amazon Linux AMI. >> > > >> > > https://github.com/mesos/spark-ec2/blob/v3/create_image.sh >> > > >> > > I think I will update the AMI's soon to get the most recent security >> > > updates. For spark-ec2's purpose this is probably sufficient (we'll >> > > only need to re-create them every few months). >> > > >> > > However, it would be cool if someone wanted to tackle providing a more >> > > general mechanism for defining Spark-friendly "images" that can be >> > > used more generally. I had thought that docker might be a good way to >> > > go for something like this - but maybe this packer thing is good too. >> > > >> > > For one thing, if we had a standard image we could use it to create >> > > containers for running Spark's unit test, which would be really cool. >> > > This would help a lot with random issues around port and filesystem >> > > contention we have for unit tests. >> > > >> > > I'm not sure if the long term place for this would be inside the spark >> > > codebase or a community library or what. But it would definitely be >> > > very valuable to have if someone wanted to take it on. >> > > >> > > - Patrick >> > > >> > > On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas >> > > <nicholas.cham...@gmail.com> wrote: >> > > > FYI: There is an existing issue -- SPARK-3314 >> > > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about >> scripting >> > > the >> > > > creation of Spark AMIs. >> > > > >> > > > With Packer, it looks like we may be able to script the creation of >> > > > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once >> from a >> > > > single Packer template. That's very cool. >> > > > >> > > > I'll be looking into this. >> > > > >> > > > Nick >> > > > >> > > > >> > > > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas < >> > > nicholas.cham...@gmail.com >> > > >> wrote: >> > > > >> > > >> Thanks for the update, Nate. I'm looking forward to seeing how >> these >> > > >> projects turn out. >> > > >> >> > > >> David, Packer looks very, very interesting. I'm gonna look into it >> > more >> > > >> next week. >> > > >> >> > > >> Nick >> > > >> >> > > >> >> > > >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <n...@reactor8.com> >> > wrote: >> > > >> >> > > >>> Bit of progress on our end, bit of lagging as well. Our guy >> leading >> > > >>> effort got little bogged down on client project to update hive/sql >> > > testbed >> > > >>> to latest spark/sparkSQL, also launching public service so we have >> > > been bit >> > > >>> scattered recently. >> > > >>> >> > > >>> Will have some more updates probably after next week. We are >> > planning >> > > on >> > > >>> taking our client work around hive/spark, plus taking over the >> bigtop >> > > >>> automation work to modernize and get that fit for human >> consumption >> > > outside >> > > >>> or org. All our work and puppet modules will be open sourced, >> > > documented, >> > > >>> hopefully start to rally some other folks around effort that find >> it >> > > useful >> > > >>> >> > > >>> Side note, another effort we are looking into is gradle >> > tests/support. >> > > >>> We have been leveraging serverspec for some basic infrastructure >> > > tests, but >> > > >>> with bigtop switching over to gradle builds/testing setup in 0.8 >> we >> > > want to >> > > >>> include support for that in our own efforts, probably some stuff >> that >> > > can >> > > >>> be learned and leveraged in spark world for repeatable/tested >> > > infrastructure >> > > >>> >> > > >>> If anyone has any specific automation questions to your >> environment >> > you >> > > >>> can drop me a line directly.., will try to help out best I can. >> Else >> > > will >> > > >>> post update to dev list once we get on top of our own product >> release >> > > and >> > > >>> the bigtop work >> > > >>> >> > > >>> Nate >> > > >>> >> > > >>> >> > > >>> -----Original Message----- >> > > >>> From: David Rowe [mailto:davidr...@gmail.com] >> > > >>> Sent: Thursday, October 02, 2014 4:44 PM >> > > >>> To: Nicholas Chammas >> > > >>> Cc: dev; Shivaram Venkataraman >> > > >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds >> > > >>> >> > > >>> I think this is exactly what packer is for. See e.g. >> > > >>> http://www.packer.io/intro/getting-started/build-image.html >> > > >>> >> > > >>> On a related note, the current AMI for hvm systems (e.g. m3.*, >> r3.*) >> > > has >> > > >>> a bad package for httpd, whcih causes ganglia not to start. For >> some >> > > reason >> > > >>> I can't get access to the raw AMI to fix it. >> > > >>> >> > > >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas < >> > > >>> nicholas.cham...@gmail.com >> > > >>> > wrote: >> > > >>> >> > > >>> > Is there perhaps a way to define an AMI programmatically? Like, >> a >> > > >>> > collection of base AMI id + list of required stuff to be >> installed >> > + >> > > >>> > list of required configuration changes. I'm guessing that's what >> > > >>> > people use things like Puppet, Ansible, or maybe also AWS >> > > >>> CloudFormation for, right? >> > > >>> > >> > > >>> > If we could do something like that, then with every new release >> of >> > > >>> > Spark we could quickly and easily create new AMIs that have >> > > everything >> > > >>> we need. >> > > >>> > spark-ec2 would only have to bring up the instances and do a >> > minimal >> > > >>> > amount of configuration, and the only thing we'd need to track >> in >> > the >> > > >>> > Spark repo is the code that defines what goes on the AMI, as >> well >> > as >> > > a >> > > >>> > list of the AMI ids specific to each release. >> > > >>> > >> > > >>> > I'm just thinking out loud here. Does this make sense? >> > > >>> > >> > > >>> > Nate, >> > > >>> > >> > > >>> > Any progress on your end with this work? >> > > >>> > >> > > >>> > Nick >> > > >>> > >> > > >>> > >> > > >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman < >> > > >>> > shiva...@eecs.berkeley.edu> wrote: >> > > >>> > >> > > >>> > > It should be possible to improve cluster launch time if we are >> > > >>> > > careful about what commands we run during setup. One way to do >> > this >> > > >>> > > would be to walk down the list of things we do for cluster >> > > >>> > > initialization and see if there is anything we can do make >> things >> > > >>> > > faster. Unfortunately this might >> > > >>> > be >> > > >>> > > pretty time consuming, but I don't know of a better strategy. >> The >> > > >>> > > place >> > > >>> > to >> > > >>> > > start would be the setup.sh file at >> > > >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh >> > > >>> > > >> > > >>> > > Here are some things that take a lot of time and could be >> > improved: >> > > >>> > > 1. Creating swap partitions on all machines. We could check if >> > > there >> > > >>> > > is a way to get EC2 to always mount a swap partition 2. >> Copying / >> > > >>> > > syncing things across slaves. The copy-dir script is called >> too >> > > many >> > > >>> > > times right now and each time it pauses for a few milliseconds >> > > >>> > > between slaves [1]. This could be improved by removing >> > unnecessary >> > > >>> > > copies 3. We could make less frequently used modules like >> > Tachyon, >> > > >>> > > persistent >> > > >>> > hdfs >> > > >>> > > not a part of the default setup. >> > > >>> > > >> > > >>> > > [1] >> https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42 >> > > >>> > > >> > > >>> > > Thanks >> > > >>> > > Shivaram >> > > >>> > > >> > > >>> > > >> > > >>> > > >> > > >>> > > >> > > >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas < >> > > >>> > > nicholas.cham...@gmail.com> wrote: >> > > >>> > > >> > > >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico < >> > n...@reactor8.com >> > > > >> > > >>> > wrote: >> > > >>> > > > >> > > >>> > > > > Starting to work through some automation/config stuff for >> > spark >> > > >>> > > > > stack >> > > >>> > > on >> > > >>> > > > > EC2 with a project, will be focusing the work through the >> > > apache >> > > >>> > bigtop >> > > >>> > > > > effort to start, can then share with spark community >> directly >> > > as >> > > >>> > things >> > > >>> > > > > progress if people are interested >> > > >>> > > > >> > > >>> > > > >> > > >>> > > > Let us know how that goes. I'm definitely interested in >> hearing >> > > >>> more. >> > > >>> > > > >> > > >>> > > > Nick >> > > >>> > > > >> > > >>> > > >> > > >>> > >> > > >>> >> > > >>> >> > > >> >> > > >> > >> > >