[Bug libgcc/91053] New: __builtin___clear_cache can fail

2019-07-02 Thread oth+gccbugs at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91053

Bug ID: 91053
   Summary: __builtin___clear_cache can fail
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: oth+gccbugs at google dot com
  Target Milestone: ---

This issue arises from using libgcc in the Android Runtime's Just-In-Time
compiler. The Android Runtime uses __buitin__clear_cache() for JIT cache
maintenance and we’ve become aware that this builtin can silently fail. This
leaves the CPU caches in an unknown state potentially leading to a crash from
the execution of unintended instruction sequences.

The specific case where we've observed this failure is for devices with ARMv7
Linux kernels. The libgcc clear_cache builtin calls the cacheflush() system
call which bottoms out in v7_coherent_user_range():

  https://github.com/torvalds/linux/blob/master/arch/arm/mm/cache-v7.S#L253

The important detail in that code is that the blocks of code within USER()
macros will call the fault handler, labelled 9001, if the cache operation
causes a fault. The return value from v7_coherent_user_range is then -EFAULT.
When this happens after code updates in the JIT cache, crashes can be expected.
For example, if we didn't manage to invalidate all of the instruction cache
range, and thus executes a mix-and-match of old and new instructions.

This issue is quite hard to reproduce and typically only occurs under memory
pressure.

The documentation for __builtin___clear_cache() does not comment on the
possibilities of failures:

  https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

but this is documented behaviour of the cacheflush() system call:

  http://man7.org/linux/man-pages/man2/cacheflush.2.html

Potential fixes could include:

  1) no change to __builtin__clear_cache, but update the documentation to
indicate that failure is possible on systems where cache flushing is a
privileged operation. On these systems callers of the builtin should clear
errno before the call and check it afterwards.

  2) change __builtin___clear_cache() to return an error code.

  3) a fix for Linux kernels so cacheflush() cannot fail. [Outside the scope of
this bug.]

Our workaround for now is to special case on ARMv7 to use the cacheflush()
system call directly and check for errors (http://r.android.com/989545).

[Bug libgcc/91053] __builtin___clear_cache can fail

2019-07-05 Thread oth+gccbugs at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91053

--- Comment #2 from Orion Hodson  ---
For sure the goal wasn't to suggest that this was due to a privileged
operation.