On 13/10/13 at 08:44 +0200, Tollef Fog Heen wrote: > We appreciate feedback while we continue our investigation of CDNs. Hi,
I'm trying to summarize the discussion so far and add my own understanding/thoughts, in a set of Q & A. Q: What problem are we trying to solve? What's the current status? ================================================================== The Debian project needs to distribute content over HTTP; mainly packages (on ftp.d.o and security.d.o) and websites (such as www.d.o). For that, a set of machines running Debian and managed by DSA and located in various datacenters all over the world are used. Additionally, for ftp.d.o, Debian relies on mirrors provided by third parties. Those mirrors are not managed by DSA, and might be running non-free software (that could include primary mirrors, ie ftp.*.debian.org). Mirroring the Debian archive is tricky, as files need to be copied in the correct order (to avoid having files in dists/ point to files not yet copied in pool/). A script (ftpsync[1]) is provided by the mirrors team, but is not used on every mirror AFAIK. [1] http://www.debian.org/mirror/ftpmirror The performance of our {packages,website} delivery network is an interesting question. Like many things on the Internet, it's related to a mix of bandwidth, latency, and application behaviour (e.g. use of HTTP keep-alive). More and more, the dominating factor in network performance is latency (as others are easier to optimize), and the only way to reduce it is to have servers close (geographically or network-wise) to end users. Benchmarking mirrors by measuring bandwidth is generally not very relevant. This raises several challenges: - DSA needs to interact with many datacenters, often for only one machine. This is very time-consuming. - The mirrors team needs to constantly monitor mirrors and notify mirror operators in case of problems. Notifications are automated, but DNS updates to *.debian.org when a mirror fails are not. - There are parts of the world that are not so well covered. For example, http://deb.li/y8GA is the current map of security.d.o mirrors (which are all managed by DSA), we don't have any point of presence in Asia, which causes poor performance. There are discussions in progress to buy a server and host it somewhere in Asia, and the cost for Debian would be between $1500 and $2500 depending on the server's specs. One solution that has been developed is http.d.n. It's a redirector service that redirects to the closest working mirror (the mirror checking is automated). However, the http.d.n machine is still centralized: round-trip time to it is still a problem, so, if the service would become official, several geographically-distributed instances of the service would have to be set up. Also, as each request goes through a http.d.n redirect, there's a lot of additional latency. If we want those http.d.n redirector machines to be managed by DSA (which is probably something we want), it doesn't really improve the situation in terms of machines DSA has to managed. Q: What are CDNs? How do they compare to our mirrors network? ============================================================= Content Delivery Networks (Akamai, Fastly, Amazon Cloudfront, etc. [1]) can be seen as giant location-aware caching networks. They provide "local" points of presence and manage global caching of external data inside the CDN network. [1] http://www.cdnplanet.com/cdns/ As a solution based on caching, they work and perform quite differently from our mirrors (where the Debian archive is fully replicated). It's not easy to compare their performance, especially if you want to consider access patterns on the mirrors (file sizes, long tail distribution, etc.) Q: Do CDNs raise more security/privacy concerns than our mirrors? ================================================================= Not easy to answer. I'm inclined to say that they both raise about the same amount of concerns. There's more discussion about those points in the subthread starting at http://lists.debian.org/2a773832-09f2-4adb-9b10-2a554b6dd...@2013.bluespice.org Q: How does that meet with Debian's Social Contract and Free Software in ======================================================================== general? ======== Some CDNs use Free Software. As data points, Fastly[1,2] uses and contributes to Varnish, and the frontend servers of Amazon Cloudfront are running Apache. [1] http://www.fastly.com/about [2] http://www.fastly.com/about/open-source Building a CDN is mostly an infrastructure problem: bring PoP in many parts of the world, manage those servers, etc. It would be about "Free Infrastructure" more than "Free Software". How much do we (Debian) care about Free Infrastructure? The Social Contract says: > 1. Debian will remain 100% free > [..] We promise that the Debian system and all its components will be > free according to these guidelines. [..] We will never make the system > require the use of a non-free component. Where does "the Debian system and all its components" stop? Does it include our packages / website content delivery network? I'm inclined to say "no". The Social contract also says: > 2. We will give back to the free software community > When we write new components of the Debian system, we will license them > in a manner consistent with the Debian Free Software Guidelines. [...] However, that doesn't address using "components" developed by third-parties, and is restricted to "components of the Debian system". So, I'm inclined to say that the Social Contract doesn't say anything about the current question. So, one question is more: where do we draw the line? - Should we use machines that require non-free firmware in the Debian infrastructure? (that's something we currently do) - Should we have a stricter policy about the use of free software in our official mirror network? - what about network equipment running non-free software? The line has to be drawn somewhere, and I honestly don't know if CDNs should be below or above the line. Another question is whether maintaining our own CDN is really something we *need* to spend our energy on. I don't think that the delivery of packages is central in the mission of Debian, nor think that maintaining our own CDN strengthen our message regarding software freedom. After all, if we could use and point to 3-4 CDNs that are advocating Free Software, isn't it better to show that such core Internet services can be run using Free Software? Q: Where should we go from here? ================================ CDNs raise significant challenges: - Can we find 3-4 CDNs (to remain independent) that: + are willing to provide that service for free to Debian + are IPv6-compliant and meet our other technical requirements + are publicly free software-friendly (none of those are super-strong requirements, but if they are not met, that raises additional questions) - Can we combine those various CDNs under the same *.debian.org name? - Can we solve the problem of data in dists/, that need a specific caching policy? I would like to encourage more experimentation from DSA on this, either as an unofficial service or as an official service under a different DNS name ({ftp,security,www}.cdn.debian.org?). However, I think that it's too early to make CDN-provided hosts part of the resolution of "normal" DNS names such as security.d.o or www.d.o, until we have a better understanding of the pros and cons of CDNs. Finally, I think that we should continue to provide an easy way for someone to run its own Debian mirror. (But in the distant future, if our feedback on CDN is positive, it means that we could remove some of Debian PoP since mirrors could synchronize over rsync from more central locations). Lucas
signature.asc
Description: Digital signature