Given the recent DDoS-triggered outages at linode (including the one today that has been the worst yet, currently 10 hours at the time I'm writing this), I've been giving some more thought to how we can make future outages less painful for the community.
I have an open issue[1] (but no code yet) to move the repository off of the server and on to a block store (s3, etc), with the goal there to make repo reads (which is what we use clojars for 99.9% of the time) independent of the status of the server. But I'm not sure that really solves the problem we are seeing today. Currently, we have two points of failure for repo reads: (1) the server itself (hosted on linode) (2) DNS for the clojars.org domain (also hosted on linode) moving the repo off of the server to a block store still has two points of failure: (1) the block store (aws, rackspace, etc) (2) DNS for the clojars.org domain, since we would CNAME the block store (hosted on linode) Though the block store provider would probably be better distributed, and have more resources to withstand a DDoS (but do any block store providers have 100% uptime?). The block store solution is complex - it introduces more moving parts into clojars, and requires reworking the way we generate usage stats, and how the api gets its data. It also requires reworking the way we administer the repo (deletion requests, cleaning up failed/partial deploys). And it may not solve the availability problem at all, since we still have two points of failure. I think a better solution may be to have multiple mirrors of the repo, either run by concerned citizens or maintained by the clojars staff. I know some folks in the community already run internal caching proxies or rsynced mirrors (and are probably chuckling knowingly at those of us affected by the outage), but those proxies don't really help those in the community that don't have that internal infrastructure. And I don't want to recommend that everyone set up a private mirror - that seems like a lot of wasted effort. Ideally, it would be nice if we had a turn-key tool for creating a mirror of clojars. We currently provide a way to rsync the repo[2], so the seed for a mirror could be small, and could then slurp down the full repo (and could continue to do so on a schedule to remain up to date). We could then publish a list of mirrors that the community could turn to in times of need (or use all the time, if they are closer geographically or just generally more responsive). Any deploys would still need to hit the primary server, but deploys are are dwarfed by reads. There are a few issues with using mirrors: (1) security - with artifacts in more places, there are more opportunities to to introduce malicious versions. This could be prevented if we had better tools for verifying that the artifacts are signed by trusted keys, and we required that all artifacts be signed, but that's not the case currently. But if we had a regular process that crawled all of the mirrors and the canonical repo to verify that the checksums every artifact are identical, this could actually improve security, since we could detect if any checksum had been changed (a malicious party would have to change the checksum of a modified artifact, since maven/lein/boot all confirm checksums by default). (2) download stats - any downloads from a mirror wouldn't get reflected in the stats for the artifact unless we had some way to report those stats back to clojars.org. We currently generate the stats by parsing the nginx access logs, mirrors could do the same and report stats back to clojars.org if we care enough about this. We don't get stats from the existing private mirrors, and the stats aren't critical, so this may be a non-issue, and definitely isn't something that has to be solved right away, if ever. The repo is just served as static files, so I think a mirror could simply be: (1) a webserver (preferably (required to be?) HTTPS) (2) a cronjob that rsyncs every N minutes And the cronjob would just need the rsync command in [2], so, to get this started, we just need: (1) linode to be up (2) people willing to run mirrors (I would say "(3) add a page to the wiki on how to use a mirror", but that would destroy the symmetry of all the other 2-item lists in this message) And it would be nice to have the process in place to verify checksums soon - that would actually be a boon if we had another linode compromise[3]. Does anyone see any issues with this plan - I'm curious if there are security implications (or anything else) that I haven't thought of? Are you willing to run a mirror? One issue that comes to mind is if we do decide to move the repo to a block store, it actually makes mirroring more difficult unless we keep a copy of the repo on disk on clojars.org as well. But I would like to have mirrors in place as soon as possible, and worry about that later. - Toby [1]: https://github.com/clojars/clojars-web/issues/433 [2]: https://github.com/clojars/clojars-web/wiki/Data#rsync-the-whole-classic-repository [3]: https://groups.google.com/d/msg/clojars-maintainers/uAVJVwRAnSU/WISqQn5E9KIJ -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.