On Wed, Nov 19, 2025 at 12:10 PM Jonathan Wakely <[email protected]> wrote:
>
> On Wed, 19 Nov 2025 at 00:55, Andrew Pinski
> <[email protected]> wrote:
> >
> > This improves the code generation slightly for std::string because of
> > aliasing. In many cases the length will be read again and the store of
> > the null character will cause the length to be re-read due to aliasing
> > requirements of the char type. So swapping around the stores will allow
> > the length not to have to be reloaded from memory and will allow
> > for more optimizations.
>
> Is that because data() could in theory point to *this (or some part of
> *this) and so writing the null character could overwrite a byte in the
> _M_length member?
> Yikes.
>
> I've previously suggested that we might want an attribute or something
> which says "this pointer doesn't alias *this", a bit like 'restrict'.
> That might help std::vector<char>, but wouldn't work for std::string
> because sometimes the data() pointer *does* point back into *this, to
> the this->_M_local_buf array. Maybe for std::string what we would want
> is an attribute that says the data() pointer doesn't alias anything
> that isn't char, so it might point to the char _M_local_buf[16] array,
> but it cannot alias anything else like _M_length because that has a
> different type.

Anything based on 'this' also does not survive inlining.  What we can encode
in the IL is "these two memory references do not alias" via MEM_REF_BASE/CLIQUE.
But I'm not sure how to expose that.  Maybe

 __mem_base (*this, 1).a

vs.

 __mem_base (b, 2).c

and within a function assigning a unique clique and use that on the annotated
memory bases.   To cite tree-core.h:

    /* The following two fields are used for MEM_REF and TARGET_MEM_REF
       expression trees and specify known data non-dependences.  For
       two memory references in a function they are known to not
       alias if dependence_info.clique are equal and dependence_info.base
       are distinct.  Clique number zero means there is no information,
       clique number one is populated from function global information
       and thus needs no remapping on transforms like loop unrolling.  */
    struct {
      unsigned short clique;
      unsigned short base;
    } dependence_info;

when inlining the cliques get re-mapped, so the information stays
there (and nothing
is known about dependences between memory refs in caller vs. callee).

Richard.


>
> Although I think I've previously speculated that we'd get problems if
> people use a std::string as a dynamically-resizable buffer and
> placement-new objects into those chars. If we had an attribute saying
> that data() never aliases other objects, it would be a lie if other
> objects are created in those bytes :-(
>
> Anyway, this patch is OK for trunk (and the branches if you want) - thanks!
>
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > libstdc++-v3/ChangeLog:
> >
> >         * include/bits/basic_string.h (basic_string::M_set_length): Swap
> >         around the order of traits_type::assign and _M_length so that
> >         _M_length is at the end.
> >
> > Signed-off-by: Andrew Pinski <[email protected]>
> > ---
> >  libstdc++-v3/include/bits/basic_string.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/libstdc++-v3/include/bits/basic_string.h 
> > b/libstdc++-v3/include/bits/basic_string.h
> > index 8ae6569f501..c4b6b1064a9 100644
> > --- a/libstdc++-v3/include/bits/basic_string.h
> > +++ b/libstdc++-v3/include/bits/basic_string.h
> > @@ -269,8 +269,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> >        void
> >        _M_set_length(size_type __n)
> >        {
> > -       _M_length(__n);
> >         traits_type::assign(_M_data()[__n], _CharT());
> > +       _M_length(__n);
> >        }
> >
> >        _GLIBCXX20_CONSTEXPR
> > --
> > 2.43.0
> >
>

Reply via email to