Re: [NWRUG] Re: This is a bit embarrassing...

Andrew Premdas Thu, 08 Feb 2018 05:13:53 -0800

Many years ago around 2009-2010 (I think Lee might remember this) we did
continuous delivery based on every commit to a Master branch. It was
inspired by a talk I saw a year or so earlier by some Swedish guys who were
doing live deploys to some big financial software written in Erlang. We
used a Jenkins/Hudson pipeline to deploy things to staging, run Cukes and
then live deploy if everything went OK. There was a one button rollback as
a fallback.


We also had a bunch of slaves which we could use to run tests in parallel.
Any dev could coop slaves to run all test in parallel on their local
machine. We used to shout imperiously across the office "More slaves, more
slaves". Our entire pipeline from commit to going live was about 15 mins.

I can't remember all the details. The slaves were VM's run under Parallels,
and we deployed with capistrano,

There was alot of opposition to doing this, but then the business really
got into it when they could interact with new work on the live site so
quickly after it had been delivered

I've never worked with such a slick infrastructure again, which is a great
shame

Anyhow just reminiscing, and also pointing out that this is nothing new and
is quite doable if you can have really good tests that can be run quickly.

All best

Andrew

On 7 February 2018 at 16:39, Paul Robinson <p...@32moves.com> wrote:

> I'm not Sean, but at my place down at that Notonthehighstreet.com (known
> to those within as "NOTHS"), we've been doing continuous deployment for at
> least 3-4 years, and it's now on its 2nd or 3rd generation/iteration.
>
> So I'll answer these questions based on what we do, not on what Sean does,
> obviously. :-)
>
>
> On Tue, Feb 6, 2018 at 11:32 AM Ian Moss wrote:
>
> > Were there pre-conditions as part of the auto deploy? i.e. all tests, or
> some particular tests had to pass... did it run the full suit, or was there
> there some detections over which particular tests covered this push to
> master?
>
>
> At NOTHS we do GitHub flow with web hook integrations into various tools.
>
> When a branch is pushed and a pull request is created, we create a test
> job named after the branch automatically in Jenkins. This will run the full
> suite of tests, which for our main monolith requires 20+ spot EC2 instances
> and takes 25-30 minutes. We have a lot of tests (about 85% coverage), and
> it's a ~70k SLOC Rails application. Most people will not need to do as
> complex a dance as ours on that step.
>
> Normally, whilst you're doing that, you'll be deploying your branch to QA
> and doing some checks anyway. Most stories have a manual QA check, some do
> not.
>
> We also have some tasks for Rubocop checks and the like. We don't enforce
> style across the whole app, just the files you've touched in your PR -
> otherwise we'd never ship anything.
>
> We also normally require peer review, so each PR is thrown into a Slack
> channel and people ask for a review from another dev who can read/write
> code in the same language (we have services in Ruby, Java, Go and a
> smattering of NodeJS and Python).
>
> If all the tests are green and the PR is approved, it can be merged. If
> it's red for any reason, a github org admin (typically a senior dev or tech
> lead), can merge despite the red build. We would do this when we have a P1
> and we're not interested in waiting for the green build (Hail Mary
> deploys!), or perhaps the Rubocop "fails" are ones we're happy to ignore on
> this iteration.
>
> On merging to master, the tests run again, because we don't want to force
> rebases but there may be regressions occurring when lots of PR merges
> happen at once. So, about once every 60 seconds, whatever is new on master
> gets the test suite run, final rubocop checks, etc. and if all is good, we
> kick off a Jenkins task to deploy to production.
>
> That task builds the Docker container, pushes it, and then shouts at the
> schedulers to deploy it, and they then roll the instances over to the new
> version. You're then deployed.
>
> We would ideally build the docker container first, then run tests against
> that, so the build artefact we're deploying is the one that was tested,
> however there are problems with that approach I won't go into now. If you
> can do that however, do that: it's a cleaner approach and will pick up any
> oddness in Docker we just assume always magically works without fail (TBF,
> we've never really had a problem with the images).
>
> In short, any green build of master will deploy, 24x7x52 night and day. We
> occasionally suspend deploys because a) we have a problem in the pipeline
> or b) we are under high load, and a deploy has to kill instances and bring
> new ones up, and losing capacity under exceptionally high load is never
> great.
>
> We have build screens up around the tech team and when a master build goes
> red, we're meant to go "all stop" and we figure out what just happened and
> triage it until it goes back to green. In reality, we isolate that to a
> small group of devs who actively triage. That group is a rotation which is
> pseudo-randomly assigned each sprint, and they also get to handle BAU and
> directing bug tickets to the relevant teams. :-)
>
> Suspending CD is a manual measure taken by seniors in conversation with
> management, and often resisted on philosophical, technical or commercial
> grounds: the question is not "should we have CD turned on?" but rather "why
> the flip would you ever, ever not want to have CD? Are you crazy?".
>
> If you do the maths, you might spot each feature takes about an hour to
> get to production, which is about right, but it's wall clock time not
> developer time: the dev is moving onto the next feature, and so there's no
> loss of productivity, and it's rare that we care about a feature landing at
> 13:07 instead of 14:07.
>
> Some might do more maths and realise this means we only do about 8
> deploys/day to prod.
>
> That's true for the monolith, but increasingly new features and services
> are being built in micro-services and those have much smaller/faster test
> suites.
>
> We're still standardising on a pattern around those to have consistency
> (to date it's been each team for themselves), but it's likely to be Jenkins
> files with tweaks when needed for anything that involves a docker
> container, and we have AWS deploy services being used for Lambdas et al.
> That'll evolve a lot over the next year.
>
> > I presume if there's a regular commit going on, then running the full
> suite of tests would not be  possible.
>
>
> It is, you just queue them up, one after the other. They'll go green/red
> sequentially and accordingly.
>
> > Often thought it'd be great to have a tool, that given a particular
> class/method within a project can identify all the tests using it. Maybe
> there's something already out there for this? Only 1 coffee in to the day,
> so unsure.
>
> So unit tests, you can probably do that. A well written unit test is only
> going to test the behaviour of a single class.
>
> However, integration tests really need to test full integration. We try
> and do the "pyramid of testing" where we have a large number of unit tests,
> a smaller number of integration tests and then a tiny number at the top of
> full-stack tests checking a user journey that might involve multiple
> services which we run relatively rarely, and normally for performance
> profiling reasons.
>
> For deploying a service we assume all unit tests and integration tests
> need to run for that service, but we don't run the full-stack tests,
> because hopefully any such regressions will have been spotted by manual QA
> anyway.
>
> I'm currently working in a team on a new set of tools that mean we get to
> go more event-sourced as an architecture, and I'm pushing for canary
> deploys to be something we do more of. That would give us a lot more
> options and confidence when deploying new versions of services.
>
> P.S. if anybody wants to come and work at ^^, working and living in London
> is not as horrific as you'd think, especially in leafy/sleepy Richmond in
> the extremities of West London where NOTHS calls home. We have a career
> page and a LinkedIn page. Tell them I sent you. :-)
>
> --
> You received this message because you are subscribed to the Google Groups
> "North West Ruby User Group (NWRUG)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to nwrug-members+unsubscr...@googlegroups.com.
> To post to this group, send an email to nwrug-members@googlegroups.com.
> Visit this group at https://groups.google.com/group/nwrug-members.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
------------------------
Andrew Premdas
blog.andrew.premdas.org

-- 
You received this message because you are subscribed to the Google Groups 
"North West Ruby User Group (NWRUG)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nwrug-members+unsubscr...@googlegroups.com.
To post to this group, send an email to nwrug-members@googlegroups.com.
Visit this group at https://groups.google.com/group/nwrug-members.
For more options, visit https://groups.google.com/d/optout.

Re: [NWRUG] Re: This is a bit embarrassing...

Reply via email to