Julian Foad wrote on Mon, Jun 06, 2011 at 15:45:11 +0100: > On Fri, 2011-06-03, Daniel Shahaf wrote: > > Julian Foad wrote on Fri, Jun 03, 2011 at 09:52:21 +0100: > > > +++ subversion/libsvn_ra_svn/marshal.c (working copy) > > > @@ -563,36 +563,6 @@ svn_error_t *svn_ra_svn_write_tuple(svn_ > > > -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, > > > - svn_stringbuf_t *stringbuf, apr_uint64_t len) > > > -{ > > > - char readbuf[4096]; > > > - apr_size_t readbuf_len; > > > - > > > - /* We can't store strings longer than the maximum size of apr_size_t, > > > - * so check for wrapping */ > > > - if (((apr_size_t) len) < len) > > > > Why did you remove this check? > > I was removing all this relatively new code and it wasn't there before, > and I wasn't convinced it was very important. But it would provide a > quick exit (instead of filling up all available memory) in some error > cases or potential DOS attempts so I'll re-instate it. >
Okay. (I asked because this hunk didn't seem related to what your message said the patch did.) > > > - return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL, > > > - _("String length larger than maximum")); > > > - ... > > > + SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len)); > > > + > > > + stringbuf->len += readbuf_len; > > > + stringbuf->data[stringbuf->len] = '\0'; > > > + len -= readbuf_len; > > > > What happens if, at this point, there are less than READBUF_LEN bytes > > available? > > > > (e.g., if the client terminates the connection in the middle of > > a string literal.) > > This part of the behaviour is unchanged from the original code. I don't > know what happens, but I imagine readbuf_read() returns an error, > perhaps SVN_ERR_RA_SVN_CONNECTION_CLOSED which I see readbuf_input() can > do. > Hmm. Right, it seems obvious when I look at the code now. (For some reason I'm sure I was reading different code at the time I wrote the question.) But... isn't there a bug here? [[[ Index: subversion/libsvn_ra_svn/marshal.c =================================================================== --- subversion/libsvn_ra_svn/marshal.c (revision 1133231) +++ subversion/libsvn_ra_svn/marshal.c (working copy) @@ -285,14 +285,15 @@ static svn_error_t *readbuf_input(svn_r apr_size_t *len, apr_pool_t *pool) { svn_ra_svn__session_baton_t *session = conn->session; + apr_size_t wanted = len; if (session && session->callbacks && session->callbacks->cancel_func) SVN_ERR((session->callbacks->cancel_func)( session->callbacks_baton)); SVN_ERR(svn_ra_svn__stream_read(conn->stream, data, len)); - if (*len == 0) + if (*len < wanted) return svn_error_create(SVN_ERR_RA_SVN_CONNECTION_CLOSED, NULL, NULL); if (session) ]]] >From svn_stream_t doxygen: * Handlers are obliged to complete a read or write to the maximum * extent possible; thus, a short read with no associated error implies * the end of the input stream, and a short write should never occur * without an associated error. > The revised patch is attached. > > - Julian > > Simplify and optimize RA-svn's string marshalling. > > This largely reverts r1028352 which merged r985606 and r1028092 from the > performance branch. That change used separate functions to handle short and > long strings. This change instead improves the original code to handle both > short and long strings efficiently. > > Compared with the pre-r1028352 code, this version uses a 1-MB chunk size > instead of 4-KB and eliminates one memcpy of the data. > > * subversion/libsvn_ra_svn/marshal.c > (read_long_string): Remove. > (read_string): Handle long strings with the same code as short strings, by > reading directly into pre-allocated space, but pre-allocate no more than > a reasonable chunk size at a time. > --This line, and those below, will be ignored-- > > Index: subversion/libsvn_ra_svn/marshal.c > =================================================================== > --- subversion/libsvn_ra_svn/marshal.c (revision 1132610) > +++ subversion/libsvn_ra_svn/marshal.c (working copy) > @@ -563,14 +563,13 @@ svn_error_t *svn_ra_svn_write_tuple(svn_ > > /* --- READING DATA ITEMS --- */ > > -/* Read LEN bytes from CONN into a supposedly empty STRINGBUF. > - * POOL will be used for temporary allocations. */ > -static svn_error_t * > -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, > - svn_stringbuf_t *stringbuf, apr_uint64_t len) > +/* Read LEN bytes from CONN into already-allocated structure ITEM. > + * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string > + * data is allocated in POOL. */ > +static svn_error_t *read_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, > + svn_ra_svn_item_t *item, apr_uint64_t len) > { > - char readbuf[4096]; > - apr_size_t readbuf_len; > + svn_stringbuf_t *stringbuf; > > /* We can't store strings longer than the maximum size of apr_size_t, > * so check for wrapping */ > @@ -578,67 +577,36 @@ read_long_string(svn_ra_svn_conn_t *conn > return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL, > _("String length larger than maximum")); > > + /* Read the string in chunks. The chunk size is large enough to avoid > + * re-allocation in typical cases, and small enough to ensure we do not > + * pre-allocate an unreasonable amount of memory if (perhaps due to > + * network data corruption or a DOS attack), we receive a bogus claim that > + * a very long string is going to follow. In that case, we start small > + * and wait for all that data to actually show up. This does not fully > + * prevent DOS attacks but makes them harder (you have to actually send > + * gigabytes of data). */ > + stringbuf = svn_stringbuf_create_ensure( > + (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD > + ? len : > SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD), > + pool); > + Sorry to point this just now, but this change seems to be problematic for the common case: It will make *every* string read allocate 1MB of memory --- in every single place in the protocol where a 'string' (as opposed to a 'word' or a tuple) is expected. And, worse, the string will then *remain* allocated in that 1MB memory space --- even if the string is just "svn:author\0". I think we want different default sizes for different strings --- i.e., a different size if the string is a property name or an fspath v. if it's an svndiff chunk of file contents --- but I'm entirely unsure that we can do that without violating layering or extending the protocol. > + /* Read the string data directly into the string structure. > + * Do it iteratively, if necessary. */ > while (len) > { > + apr_size_t readbuf_len > + = (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD > + ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD); > + > + svn_stringbuf_ensure(stringbuf, stringbuf->len + readbuf_len + 1); > + SVN_ERR(readbuf_read(conn, pool, stringbuf->data + stringbuf->len, > + readbuf_len)); > > + stringbuf->len += readbuf_len; This is related to the overflow check on LEN earlier: because of having that check, we know that this addition won't overflow. > + stringbuf->data[stringbuf->len] = '\0'; > len -= readbuf_len; > } > > /* Return the string properly wrapped into an RA_SVN item. */ > item->kind = SVN_RA_SVN_STRING; > item->u.string = svn_stringbuf__morph_into_string(stringbuf);