Martin Ågren <martin.ag...@gmail.com> writes:

> If two threads have one freshly initialized string buffer each and call
> strbuf_reset on them at roughly the same time, both threads will be
> writing a '\0' to strbuf_slopbuf. That is not a problem in practice
> since it doesn't matter in which order the writes happen. But
> ThreadSanitizer will consider this a race.
>
> When compiling with GIT_THREAD_SANITIZER, avoid writing to
> strbuf_slopbuf. Let's instead assert on the first byte of strbuf_slopbuf
> being '\0', since it ensures the promised invariant of "buf[len] ==
> '\0'". (Writing to strbuf_slopbuf is normally bad, but could become even
> more bad if we stop covering it up in strbuf_reset.)
>
> Signed-off-by: Martin Ågren <martin.ag...@gmail.com>
> ---
>  strbuf.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/strbuf.h b/strbuf.h
> index e705b94db..295654d39 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -153,7 +153,19 @@ static inline void strbuf_setlen(struct strbuf *sb, 
> size_t len)
>  /**
>   * Empty the buffer by setting the size of it to zero.
>   */
> +#ifdef GIT_THREAD_SANITIZER
> +#define strbuf_reset(sb)                                             \
> +     do {                                                            \
> +             struct strbuf *_sb = sb;                                \
> +             _sb->len = 0;                                           \
> +             if (_sb->buf == strbuf_slopbuf)                         \
> +                     assert(!strbuf_slopbuf[0]);                     \
> +             else                                                    \
> +                     _sb->buf[0] = '\0';                             \
> +     } while (0)
> +#else
>  #define strbuf_reset(sb)  strbuf_setlen(sb, 0)
> +#endif
>  
>  
>  /**

The strbuf_slopbuf[] is a shared resource that is expected by
everybody to stay a holder of a NUL.  Even though it is defined as
"char [1]", it in spirit ought to be considered const.  And from
that point of view, your new definition that is conditionally used
only when sanitizer is in use _is_ the more correct one than the
current "we do not care if it is slopbuf, we are writing \0 so it
will be no-op anyway" code.

I wonder if we excessively call strbuf_reset() in the real code to
make your version unacceptably expensive?  If not, I somehow feel
that using this version unconditionally may be a better approach.

What happens when a caller calls "strbuf_setlen(&sb, 0)" on a strbuf
that happens to have nothing and whose buffer still points at the
slopbuf (instead of calling _reset())?  Shouldn't your patch fix
that function instead, i.e. something like the following without the
above?  Is that make things noticeably and measurably too expensive?

Thanks.

 strbuf.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/strbuf.h b/strbuf.h
index 2075384e0b..1a77fe146a 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -147,7 +147,10 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t 
len)
        if (len > (sb->alloc ? sb->alloc - 1 : 0))
                die("BUG: strbuf_setlen() beyond buffer");
        sb->len = len;
-       sb->buf[len] = '\0';
+       if (sb->buf != strbuf_slopbuf)
+               sb->buf[len] = '\0';
+       else
+               assert(!strbuf_slopbuf[0]);
 }
 
 /**

Reply via email to