https://bugs.kde.org/show_bug.cgi?id=435732

Paul Floyd <pjfl...@wanadoo.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pjfl...@wanadoo.fr

--- Comment #1 from Paul Floyd <pjfl...@wanadoo.fr> ---
Just a few comments on this. I'm not too sure how the std::string heuristic
works these days.


Here is the code from mc_leakcheck.c

if (HiS(LchStdString, heur_set)) {
      // Detects inner pointers to Std::String for layout being
      //     length capacity refcount char_array[] \0
      // where ptr points to the beginning of the char_array.
      // Note: we check definedness for length and capacity but
      // not for refcount, as refcount size might be smaller than
      // a SizeT, giving a uninitialised hole in the first 3 SizeT.
      if ( ptr == ch->data + 3 * sizeof(SizeT)
           && MC_(is_valid_aligned_word)(ch->data + sizeof(SizeT))) {
         const SizeT capacity = *((SizeT*)(ch->data + sizeof(SizeT)));
         if (3 * sizeof(SizeT) + capacity + 1 == ch->szB
            && MC_(is_valid_aligned_word)(ch->data)) {
            const SizeT length = *((SizeT*)ch->data);
            if (length <= capacity) {
               // ??? could check there is no null byte from ptr to
ptr+length-1
               // ???    and that there is a null byte at ptr+length.
               // ???
               // ??? could check that ch->allockind is MC_AllocNew ???
               // ??? probably not a good idea, as I guess stdstring
               // ??? allocator can be done via custom allocator
               // ??? or even a call to malloc ????
               VG_(set_fault_catcher) (prev_catcher);
               return LchStdString;
            }
         }
      }
   }

Modern C++ does not use an std:string layout that looks like
//     length capacity refcount char_array[] \0

In fact, since C++11, refounts in strings has been forbidden. GCC was a bit
slow implementing this, which was done for GCC 5. With GCC 6 or 7 (can't
remember which) the default C++ compilation changed to C++14. This is fairly
important as GCC maintained backwards compatibility using C++ namespaces, So it
is possible to use either the old or the new layout
[https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html]. GCC 11
might have switched to defaulting to C++17, which could change things again
(but I'm not aware of any such changes).

So what are the newer layouts?

With GCC / libstdc++ it is like this


   // _Alloc_hider just contains
   //  pointer _M_p; // The actual data.
   _Alloc_hider      _M_dataplus;
   size_type         _M_string_length;

   enum { _S_local_capacity = 15 / sizeof(_CharT) };

   union
   {
     _CharT           _M_local_buf[_S_local_capacity + 1];
     size_type        _M_allocated_capacity;
   };

With clang/libc++, see here 

https://eu90h.github.io/cpp-strings.html

Roughly speaking, the std:string now uses something called the "short string
optimization" which avoids any 

struct __long {
   size_type __cap_;
   size_type __size_;
   pointer   __data_;
};


struct __short {
   union {
         unsigned char __size_;
         value_type __lx;
   };
   value_type __data_[__min_cap];
};

where __min_cap is 23.


Either way, the actual data pointer is not where the current code assumes that
it is. I don't know of a way to tell which ABI is being used.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to