First of all a bit of top posting here because I want to paint my response to Greg and then ultimately Ivan's email with a bit of background. In the process of preparing to do the 1.8.1 release I spent quite a bit of time reviewing nominated changes. I'll admit I hadn't paid too much attention to this issue while I was on vacation or when I got back last week. It seemed like there was a agreed upon solution. What I didn't realize is that we'd agreed upon an option named "busted-proxy" and agreed upon forcing users to set an option in order to work in these circumstances.
I'm replying to Greg's and Ivan's emails because after re-reading this entire thread these two emails present the two essential arguments for the option to exist at all. 1) That the proxies are busted and that having an option named busted-proxy we will encourage proxy administrators to fix the problem. Greg's email makes this argument. 2) That there is a performance cost of making the extra request and that the request is too high of a cost for users that don't have these proxies in their way. Ivan's email makes this argument. Most people on this thread have argued that we should automatically detect this behavior and adjust accordingly. The above are the two strongest arguments for the option. On Fri, Jun 28, 2013 at 8:37 PM, Greg Stein <gst...@gmail.com> wrote: > I'm not seeing the point. Subversion will (now) work, but we still view the > proxy as busted. It doesn't provide the performance characteristics that > Subversion wants and expects. Note that Subversion is built to work against > mod_dav_svn which is HTTP/1.1 with chunked requests. The interposition of a > proxy that disables chunked requests... busted. Greg's argument here mostly depends on the idea that Subversion is built to work against mod_dav_svn and any proxy or implementation that doesn't implement HTTP/1.1 with chunked requests support as mod_dav_svn does is busted. I find this argument very weak and unconvincing. Especially since this assumes that mod_dav_svn is the only server side implementation of our protocol. We know this is not the case. GitHub has their own implementation of the DAV server side in order to support Subversion clients checking out of git repos hosted there. I'm sure there are others out there we aren't necessarily even aware of (I can think of at least one more but I'm not going to go into the details since it's not important here). If we fail to follow the standard those sorts of implementations are the things we risk losing as a project. In this case I don't think GitHub's functionality breaks because of this issue. But let's say that their implementation for some reason chose not to support chunked requests? If we started breaking GitHub's server implementation with our clients on a regular basis because we wanted to require the use features of HTTP that we have not required in the past what do you suppose they would do? I'd expect that they'd probably eventually give up on bothering to support our client. I see this as a turning point for this project. If we go down this road, then even our DAV protocol is not following a standard. I know we threw out DeltaV with HTTPv2, but the distinction here is that we were backwards compatible. In this case we are not backwards compatible with what we supported in Subversion 1.7.x because we did not fail when proxied through a proxy that refused chunked requests. While I think it's been sufficiently proven elsewhere in this thread I'll make the argument that we are not HTTP/1.1 compliant if we don't handle this since part of my argument above assumes that we're not compliant if we don't handle this case. The draft RFC for improving the HTTP/1.1 standard, says that you don't need to support chunked requests to be HTTP/1.1 compliant. Kevin linked it earlier but here it is again: http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.3.3 The important text is [[[ A server MAY reject a request that contains a message body but not a Content-Length by responding with 411 (Length Required). ]]] The chair of the working group posted this: http://www.mnot.net/blog/2011/07/11/what_proxies_must_do Obviously, it would be nice if everyone supported chunked requests. But the reality is that hasn't fully happened. RFC 2616 left the door open to HTTP/1.1 servers to not support chunked requests by providing the 411 status response. Eventually the HTTPbis draft will become an RFC and these ambiguities will be entirely removed. As such I assert that the HTTP/1.1 standard does allow this behavior and that these proxies are not broken, busted, non-compliant or whatever phrase you want to use to that effect. (Yes I originally thought chunked was required, but the above two links have convinced me otherwise). It's hard to argue that we're following the HTTP/1.1 standard when our behavior when proxied through a server that is exercising it's option to not implement chunked requests is to fail. A fundamental feature of HTTP is negotiation to prevent these sorts of failures so that everyone can independently implement the standard. > "Prefer" is not the same as "must" :-) This recommendation is a suggestion to avoid the very problem we're talking about here. Specifically it says (again from the IETF link above): [[[ Unless a transfer coding other than chunked has been applied, a client that sends a request containing a message body SHOULD use a valid Content-Length header field if the message body length is known in advance, rather than the chunked transfer coding, since some existing services respond to chunked with a 411 (Length Required) status code even though they understand the chunked transfer coding. This is typically because such services are implemented via a gateway that requires a content-length in advance of being called and the server is unable or unwilling to buffer the entire request before processing. ]]] This is flat out the "be liberal with what you accept and conservative with what you send policy". We've now been bitten by the very issue that this paragraph exists to warn us of. Are we really so arrogant as to suggest that we should ignore this advice? > In our current model, we prefer chunked. But there is a pretty > straightforward extension to serf's bucket model that should allow us to use > C-L in many situations. We *might* be able to do that in a serf 1.x, but I'm > not sure. Worst case, we'll add the feature in serf 2.x. Ivan has managed to implement this on Serf trunk. It seems that the plan is now to include it in Serf 1.4.x which Ivan says should be out in a month (not holding my breath but still relatively soon). Given this development I'm wondering if the option makes sense at all. Shouldn't we just eat the extra round trip to detect this issue and then when we build against Serf 1.4.x simply stop issuing the extra request to detect it? Which brings us to Ivan's argument, which is that the cost is too expensive to just detect this always. On Tue, Jun 25, 2013 at 3:21 PM, Ivan Zhakov <i...@visualsvn.com> wrote: > Please note that this extra request is per session and currently we > create many sessions even during one operation. And I'm also not happy > to make performance worse for users who doesn't use reverse proxies > and etc. First and foremost I'd argue that this is not backed by any performance measurements. We have the feature implemented. How bad does it really impact performance. We limit RA serf to a max of 8 connections, but with things like svn:externals we open new connections and then close them, so 8 may not be the absolute worst case scenario. [[[ #define SVN_RA_SERF__MAX_CONNECTIONS_LIMIT 8 ]]] We also require that we can make at least 2 connections to the server. So it's hard to say what the typical behavior would be across so many different configurations. But worst case scenario is always going to be number_connections_opened * round_trip_time (RTT). The corporate users that Ivan is worried about that are likely to have no issues here because they have no such proxy in their way are the ones likely to have the lowest RTT. So let's say you have a RTT of 500ms, that means your command is going to be delayed by 4 seconds assuming 8 connections, it also means that you're going to have a pretty bad time using Subversion anyway. Yes that sucks, but I doubt very many of our commands actually make use of all 8 of those connections, remember we only actually require 2. Right now from the US I have a RTT to svn.eu.apache.org of about 180ms and to svn.us.apache.org of about 20ms, meaning worst case scenario for those is a delay of 1.4 seconds and 160ms. I have a very hard time believing that the above is worth the concern. We should implement this detection behavior by default. Given Ivan's work on Serf to prefer sending Content-Length, we can disable it if we're built against a Serf that behaves this way. I think that we should defer from adding an option to manipulate this behavior until such a time that we find significant performance issues. Especially if down the line we're not going to need it because we can prefer CL. If we feel that we MUST have the option then I think a different name should be selected. I realize that Greg pushed for this name in order to encourage proxy admins to fix their proxies to support chunked requests. But that name seems like a huge mistake in light of the trend towards HTTP/1.1 explicitly allowing servers to behave this way. Ignoring any issues with the busted part of "busted-proxy" it also doesn't follow our naming conventions. We have several http-proxy-* options. I realize these are for client forward requests, but if we're going to add an option we should follow our naming conventions, meaning at a minimum http- should be prefixed. If we must stay with this "busted-proxy" name then "http-proxy-busted" is a better match to what we have. However, i think that risks confusing users who think it only applies when they have a forward proxy configured. I'd suggest http-detect-chunking with a description of "Detect if the server supports chunked requests. This may be necessary for use when a proxy does not support chunked requests. By default this is off because mod_dav_svn always supports chunked requests and detection of this hurts performance." I'll look some more tomorrow into other options here with respect to a better way to detect this but I'm having a hard time believe are options are anything other than 1) extra request, 2) don't use chunked requests, 3) punt on supporting these proxies.