On Wed, Sep 9, 2015 at 12:31 PM, Evgeny Kotkov <evgeny.kot...@visualsvn.com> wrote:
> Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> writes: > > >> The original commit begins using svn_stream_wrap_buffered_read() during > >> svnadmin load-revprops and svnfsfs load-index. This patch, however, > does > >> something entirely different and adds buffering to *every* stdin. > > > > Sorry for the confusion. I did not intend to actually change the > > implementation of svn_stream_for_stdin this way but simply tried to > > demonstrate a problem with APR buffer reads for "streamy" file handles. > > Thank you for the explanation. > After some debugging I found the reason for the ra-test hang. Once I knew it, it's been kind of obvious from the code: For buffered files, apr_file_read() is more or less equivalent to apr_file_write_full(). Changing that in APR might break APR internals (buffer may be half-filled but not be at EOF) as well as user code (read_full required where a normal read used to be sufficient). > > The underlying problem is still present: If stdin can't deliver data fast > > enough (e.g. some hick-up / long latency on the producer side of a dump | > > load), a buffered APR file will error out while a non-buffered one will > > simply wait & retry. > > > > However, I have yet to try and provoke the error specifically for the > > dump | load scenario. > > I think that this problem is nonexistent. > > A program doesn't need to handle EAGAIN during read() unless it puts the > file > into the non-blocking mode with O_NONBLOCK [1]. We don't do that for > STDIN, > and, irrespectively of what happens with the data on the other side of the > pipe, read() is going to block until the requested data is available. > > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html > You are right. I tried various methods to delay the producer side and it never caused the consumer to error out. The ra-test hang mislead me into believing that the EAGAIN handling was the problem. So, you are absolutely right about APR buffering being enough for our non-interactive "read stdin as a single large file" usage in load & friends. My last concern is as follows. We need an alternative implementation for svn_stream_for_stdin. I see 3 options: 1. svn_stream_for_stdin2 with buffered option. 2. Always enable buffering in svn_stream_for_stdin. 3. Some private API. The problem with 1. is that if we use APR_BUFFERED for it, the new API will be add leakage to the abstraction: the user has to know when it is safe to enable buffering. This problem does not occur if the wrapper stream is being used instead. Thus, variant 2 is impossible with APR_BUFFERED but quite possible using the buffering wrapper. The only way it could cause a problem is if some code was to rely on the actual size of a read request from stdin. Since the latter is managed by the OS + C runtime, it is hard to think of a way to make this happen - but still. Having a fall-back would be nice, i.e. variant 1 is safer than 2. Option 3 is basically saying "the public API is not good enough but we don't want to give you a better one". It is, however, the only variant where I would be fully comfortable just using APR buffering. Your thoughts? -- Stefan^2.