Julian Foad wrote on Mon, Jun 06, 2011 at 15:45:11 +0100:
> On Fri, 2011-06-03, Daniel Shahaf wrote:
> > Julian Foad wrote on Fri, Jun 03, 2011 at 09:52:21 +0100:
> > > +++ subversion/libsvn_ra_svn/marshal.c    (working copy)
> > > @@ -563,36 +563,6 @@ svn_error_t *svn_ra_svn_write_tuple(svn_
> > > -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> > > -                 svn_stringbuf_t *stringbuf, apr_uint64_t len)
> > > -{
> > > -  char readbuf[4096];
> > > -  apr_size_t readbuf_len;
> > > -
> > > -  /* We can't store strings longer than the maximum size of apr_size_t,
> > > -   * so check for wrapping */
> > > -  if (((apr_size_t) len) < len)
> > 
> > Why did you remove this check?
> 
> I was removing all this relatively new code and it wasn't there before,
> and I wasn't convinced it was very important.  But it would provide a
> quick exit (instead of filling up all available memory) in some error
> cases or potential DOS attempts so I'll re-instate it.
> 

Okay.  (I asked because this hunk didn't seem related to what your
message said the patch did.)

> > > -    return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL,
> > > -                            _("String length larger than maximum"));
> > > -
...
> > > +      SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len));
> > > +
> > > +      stringbuf->len += readbuf_len;
> > > +      stringbuf->data[stringbuf->len] = '\0';
> > > +      len -= readbuf_len;
> > 
> > What happens if, at this point, there are less than READBUF_LEN bytes
> > available?
> > 
> > (e.g., if the client terminates the connection in the middle of
> > a string literal.)
> 
> This part of the behaviour is unchanged from the original code.  I don't
> know what happens, but I imagine readbuf_read() returns an error,
> perhaps SVN_ERR_RA_SVN_CONNECTION_CLOSED which I see readbuf_input() can
> do.
> 

Hmm.  Right, it seems obvious when I look at the code now.  (For some
reason I'm sure I was reading different code at the time I wrote the
question.)

But... isn't there a bug here?

[[[
Index: subversion/libsvn_ra_svn/marshal.c
===================================================================
--- subversion/libsvn_ra_svn/marshal.c  (revision 1133231)
+++ subversion/libsvn_ra_svn/marshal.c  (working copy)
@@ -285,14 +285,15 @@ static svn_error_t *readbuf_input(svn_r
 
                                   apr_size_t *len, apr_pool_t *pool)
 {
   svn_ra_svn__session_baton_t *session = conn->session;
+  apr_size_t wanted = len;
 
   if (session && session->callbacks &&
       session->callbacks->cancel_func)
     SVN_ERR((session->callbacks->cancel_func)(
               session->callbacks_baton));
 
   SVN_ERR(svn_ra_svn__stream_read(conn->stream, data, len));
-  if (*len == 0)
+  if (*len < wanted)
     return svn_error_create(SVN_ERR_RA_SVN_CONNECTION_CLOSED, NULL, NULL);
 
   if (session)
]]]

>From svn_stream_t doxygen:

 * Handlers are obliged to complete a read or write to the maximum
 * extent possible; thus, a short read with no associated error implies
 * the end of the input stream, and a short write should never occur
 * without an associated error.

> The revised patch is attached.
> 
> - Julian
> 

> Simplify and optimize RA-svn's string marshalling.
> 
> This largely reverts r1028352 which merged r985606 and r1028092 from the
> performance branch.  That change used separate functions to handle short and
> long strings.  This change instead improves the original code to handle both
> short and long strings efficiently.
> 
> Compared with the pre-r1028352 code, this version uses a 1-MB chunk size
> instead of 4-KB and eliminates one memcpy of the data.
> 
> * subversion/libsvn_ra_svn/marshal.c
>   (read_long_string): Remove.
>   (read_string): Handle long strings with the same code as short strings, by
>     reading directly into pre-allocated space, but pre-allocate no more than
>     a reasonable chunk size at a time.
> --This line, and those below, will be ignored--
> 
> Index: subversion/libsvn_ra_svn/marshal.c
> ===================================================================
> --- subversion/libsvn_ra_svn/marshal.c        (revision 1132610)
> +++ subversion/libsvn_ra_svn/marshal.c        (working copy)
> @@ -563,14 +563,13 @@ svn_error_t *svn_ra_svn_write_tuple(svn_
>  
>  /* --- READING DATA ITEMS --- */
>  
> -/* Read LEN bytes from CONN into a supposedly empty STRINGBUF.
> - * POOL will be used for temporary allocations. */
> -static svn_error_t *
> -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> -                 svn_stringbuf_t *stringbuf, apr_uint64_t len)
> +/* Read LEN bytes from CONN into already-allocated structure ITEM.
> + * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string
> + * data is allocated in POOL. */
> +static svn_error_t *read_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> +                                svn_ra_svn_item_t *item, apr_uint64_t len)
>  {
> -  char readbuf[4096];
> -  apr_size_t readbuf_len;
> +  svn_stringbuf_t *stringbuf;
>  
>    /* We can't store strings longer than the maximum size of apr_size_t,
>     * so check for wrapping */
> @@ -578,67 +577,36 @@ read_long_string(svn_ra_svn_conn_t *conn
>      return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL,
>                              _("String length larger than maximum"));
>  
> +  /* Read the string in chunks.  The chunk size is large enough to avoid
> +   * re-allocation in typical cases, and small enough to ensure we do not
> +   * pre-allocate an unreasonable amount of memory if (perhaps due to
> +   * network data corruption or a DOS attack), we receive a bogus claim that
> +   * a very long string is going to follow.  In that case, we start small
> +   * and wait for all that data to actually show up.  This does not fully
> +   * prevent DOS attacks but makes them harder (you have to actually send
> +   * gigabytes of data). */
> +  stringbuf = svn_stringbuf_create_ensure(
> +                (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD
> +                             ? len : 
> SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD),
> +                pool);
> +

Sorry to point this just now, but this change seems to be problematic
for the common case:

It will make *every* string read allocate 1MB of memory --- in every
single place in the protocol where a 'string' (as opposed to a 'word' or
a tuple) is expected.  And, worse, the string will then *remain*
allocated in that 1MB memory space --- even if the string is just
"svn:author\0".

I think we want different default sizes for different strings --- i.e.,
a different size if the string is a property name or an fspath v. if
it's an svndiff chunk of file contents --- but I'm entirely unsure that
we can do that without violating layering or extending the protocol.

> +  /* Read the string data directly into the string structure.
> +   * Do it iteratively, if necessary.  */
>    while (len)
>      {
> +      apr_size_t readbuf_len
> +        = (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD
> +                       ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD);
> +
> +      svn_stringbuf_ensure(stringbuf, stringbuf->len + readbuf_len + 1);
> +      SVN_ERR(readbuf_read(conn, pool, stringbuf->data + stringbuf->len,
> +                           readbuf_len));
>  
> +      stringbuf->len += readbuf_len;

This is related to the overflow check on LEN earlier: because of having
that check, we know that this addition won't overflow.

> +      stringbuf->data[stringbuf->len] = '\0';
>        len -= readbuf_len;
>      }
>  
>    /* Return the string properly wrapped into an RA_SVN item. */
>    item->kind = SVN_RA_SVN_STRING;
>    item->u.string = svn_stringbuf__morph_into_string(stringbuf);

Reply via email to