On Wed, Nov 19, 2025 at 1:45 PM Richard Biener
<[email protected]> wrote:
>
> On Wed, Nov 19, 2025 at 12:10 PM Jonathan Wakely <[email protected]> wrote:
> >
> > On Wed, 19 Nov 2025 at 00:55, Andrew Pinski
> > <[email protected]> wrote:
> > >
> > > This improves the code generation slightly for std::string because of
> > > aliasing. In many cases the length will be read again and the store of
> > > the null character will cause the length to be re-read due to aliasing
> > > requirements of the char type. So swapping around the stores will allow
> > > the length not to have to be reloaded from memory and will allow
> > > for more optimizations.
> >
> > Is that because data() could in theory point to *this (or some part of
> > *this) and so writing the null character could overwrite a byte in the
> > _M_length member?
> > Yikes.
> >
> > I've previously suggested that we might want an attribute or something
> > which says "this pointer doesn't alias *this", a bit like 'restrict'.
> > That might help std::vector<char>, but wouldn't work for std::string
> > because sometimes the data() pointer *does* point back into *this, to
> > the this->_M_local_buf array. Maybe for std::string what we would want
> > is an attribute that says the data() pointer doesn't alias anything
> > that isn't char, so it might point to the char _M_local_buf[16] array,
> > but it cannot alias anything else like _M_length because that has a
> > different type.
>
> Anything based on 'this' also does not survive inlining.  What we can encode
> in the IL is "these two memory references do not alias" via 
> MEM_REF_BASE/CLIQUE.
> But I'm not sure how to expose that.  Maybe
>
>  __mem_base (*this, 1).a
>
> vs.
>
>  __mem_base (b, 2).c
>
> and within a function assigning a unique clique and use that on the annotated
> memory bases.   To cite tree-core.h:
>
>     /* The following two fields are used for MEM_REF and TARGET_MEM_REF
>        expression trees and specify known data non-dependences.  For
>        two memory references in a function they are known to not
>        alias if dependence_info.clique are equal and dependence_info.base
>        are distinct.  Clique number zero means there is no information,
>        clique number one is populated from function global information
>        and thus needs no remapping on transforms like loop unrolling.  */
>     struct {
>       unsigned short clique;
>       unsigned short base;
>     } dependence_info;
>
> when inlining the cliques get re-mapped, so the information stays
> there (and nothing
> is known about dependences between memory refs in caller vs. callee).

Then, there is DECL_NONADDRESSABLE_P on a FIELD_DECL which for,
say

  struct{
    int len;
    char *data;
  };

could say that 'len' cannot have its address taken and thus the 'data' member
cannot point to it.  Not sure if it's possible to take the address of
the length field
of a std::string though, this mechanism isn't specific enough to only disallow
'data' from pointing to 'len'.  DECL_NONADDRESSABLE_P is used by Ada
(as is TYPE_NONALIASED_COMPONENT).

Richard.

> Richard.
>
>
> >
> > Although I think I've previously speculated that we'd get problems if
> > people use a std::string as a dynamically-resizable buffer and
> > placement-new objects into those chars. If we had an attribute saying
> > that data() never aliases other objects, it would be a lie if other
> > objects are created in those bytes :-(
> >
> > Anyway, this patch is OK for trunk (and the branches if you want) - thanks!
> >
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >         * include/bits/basic_string.h (basic_string::M_set_length): Swap
> > >         around the order of traits_type::assign and _M_length so that
> > >         _M_length is at the end.
> > >
> > > Signed-off-by: Andrew Pinski <[email protected]>
> > > ---
> > >  libstdc++-v3/include/bits/basic_string.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/basic_string.h 
> > > b/libstdc++-v3/include/bits/basic_string.h
> > > index 8ae6569f501..c4b6b1064a9 100644
> > > --- a/libstdc++-v3/include/bits/basic_string.h
> > > +++ b/libstdc++-v3/include/bits/basic_string.h
> > > @@ -269,8 +269,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> > >        void
> > >        _M_set_length(size_type __n)
> > >        {
> > > -       _M_length(__n);
> > >         traits_type::assign(_M_data()[__n], _CharT());
> > > +       _M_length(__n);
> > >        }
> > >
> > >        _GLIBCXX20_CONSTEXPR
> > > --
> > > 2.43.0
> > >
> >

Reply via email to