https://bugs.llvm.org/show_bug.cgi?id=35896

            Bug ID: 35896
           Summary: MSAN read-past-string-end in CXString
           Product: clang
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: libclang
          Assignee: unassignedclangb...@nondot.org
          Reporter: st...@obrien.cc
                CC: kli...@google.com, llvm-bugs@lists.llvm.org

Class `CXString` has a method `createRef(llvm::StringRef)` that tries to
reference the bytes of an existing string, without copying, if possible.  (We
can assume the pre-existing string bytes' memory remains unchanged, allocated,
and otherwise "good".)

A `StringRef` represents a run of sequential chars in memory; whereas a
`CXString` always points to a C-like string, i.e., there must be an array
somewhere of bytes, terminated by a NUL character.

`StringRef` doesn't have that NUL terminator requirement; so `createRef`, which
wants to recycle existing memory might be dealing with a NUL-terminated string
(which it can reuse) or otherwise has to copy the non-NUL terminated bytes into
a new array, with one extra byte for that terminator.

The trouble is this: `CXString` checks the byte at `str[stringLength]`, which
is technically out-of-bounds for the string.  If that byte is 0 then it's a
NUL-terminated C string and it can be reused (otherwise it has to be copied).

Since that access is one past the bounds of the string, this raises an MSAN
error.

One easy fix is to always copy the string data and never attempt to reuse bytes
from a `StringRef`.  I fear that increased byte-copies will waste both memory
and CPU.  (As correct as this approach is, it's inefficient.)

Another is to make `CXString`s look more like `StringRef`s, and include a
length / end-of-string pointer, to avoid the NUL requirement.  But as this
library is used in primarily another language (via `cindex` python bindings)
I'm not sure whether this is feasible or not.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to