Given the recent DDoS-triggered outages at linode (including the one
today that has been the worst yet, currently 10 hours at the time I'm
writing this), I've been giving some more thought to how we can make
future outages less painful for the community.

I have an open issue[1] (but no code yet) to move the repository off
of the server and on to a block store (s3, etc), with the goal there
to make repo reads (which is what we use clojars for 99.9% of the
time) independent of the status of the server. But I'm not sure that
really solves the problem we are seeing today. Currently, we have two
points of failure for repo reads:

(1) the server itself (hosted on linode)
(2) DNS for the clojars.org domain (also hosted on linode)

moving the repo off of the server to a block store still has two
points of failure:

(1) the block store (aws, rackspace, etc)
(2) DNS for the clojars.org domain, since we would CNAME the block
     store (hosted on linode)

Though the block store provider would probably be better distributed,
and have more resources to withstand a DDoS (but do any block store
providers have 100% uptime?).

The block store solution is complex - it introduces more moving parts
into clojars, and requires reworking the way we generate usage stats,
and how the api gets its data. It also requires reworking the way we
administer the repo (deletion requests, cleaning up failed/partial
deploys). And it may not solve the availability problem at all, since
we still have two points of failure.

I think a better solution may be to have multiple mirrors of the repo,
either run by concerned citizens or maintained by the clojars staff. I
know some folks in the community already run internal caching proxies
or rsynced mirrors (and are probably chuckling knowingly at those of
us affected by the outage), but those proxies don't really help those
in the community that don't have that internal infrastructure. And I
don't want to recommend that everyone set up a private mirror - that
seems like a lot of wasted effort.

Ideally, it would be nice if we had a turn-key tool for creating a
mirror of clojars. We currently provide a way to rsync the repo[2], so
the seed for a mirror could be small, and could then slurp down the
full repo (and could continue to do so on a schedule to remain up to
date). We could then publish a list of mirrors that the community
could turn to in times of need (or use all the time, if they are
closer geographically or just generally more responsive). Any deploys
would still need to hit the primary server, but deploys are are
dwarfed by reads.

There are a few issues with using mirrors:

(1) security - with artifacts in more places, there are more
    opportunities to to introduce malicious versions. This could be
    prevented if we had better tools for verifying that the artifacts
    are signed by trusted keys, and we required that all artifacts be
    signed, but that's not the case currently. But if we had a regular
    process that crawled all of the mirrors and the canonical repo to
    verify that the checksums every artifact are identical, this could
    actually improve security, since we could detect if any checksum
    had been changed (a malicious party would have to change the
    checksum of a modified artifact, since maven/lein/boot all confirm
    checksums by default).

(2) download stats - any downloads from a mirror wouldn't get
    reflected in the stats for the artifact unless we had some way to
    report those stats back to clojars.org. We currently generate the
    stats by parsing the nginx access logs, mirrors could do the same
    and report stats back to clojars.org if we care enough about
    this. We don't get stats from the existing private mirrors, and
    the stats aren't critical, so this may be a non-issue, and
    definitely isn't something that has to be solved right away, if
    ever.

The repo is just served as static files, so I think a mirror could
simply be:

(1) a webserver (preferably (required to be?) HTTPS)
(2) a cronjob that rsyncs every N minutes

And the cronjob would just need the rsync command in [2], so, to get
this started, we just need:

(1) linode to be up
(2) people willing to run mirrors

(I would say "(3) add a page to the wiki on how to use a mirror", but
that would destroy the symmetry of all the other 2-item lists in this
message)

And it would be nice to have the process in place to verify checksums
soon - that would actually be a boon if we had another linode
compromise[3].

Does anyone see any issues with this plan - I'm curious if there are
security implications (or anything else) that I haven't thought of?

Are you willing to run a mirror?

One issue that comes to mind is if we do decide to move the repo to a
block store, it actually makes mirroring more difficult unless we keep
a copy of the repo on disk on clojars.org as well. But I would like to
have mirrors in place as soon as possible, and worry about that later.

- Toby

[1]: https://github.com/clojars/clojars-web/issues/433
[2]: 
https://github.com/clojars/clojars-web/wiki/Data#rsync-the-whole-classic-repository
[3]: 
https://groups.google.com/d/msg/clojars-maintainers/uAVJVwRAnSU/WISqQn5E9KIJ

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to