On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel <trie...@redhat.com> wrote:
> I'd like to know, based on the GCC experience, how important we consider
> optimizations that may turn data dependencies of pointers into control
> dependencies.  I'm thinking about all optimizations or transformations
> that guess that a pointer might have a specific value, and then create
> (specialized) code that assumes this value that is only executed if the
> pointer actually has this value.  For example:
>
> int d[2] = {23, compute_something()};
>
> int compute(int v) {
>   if (likely(v == 23)) return 23;
>   else <lots of stuff>;
> }
>
> int bar() {
>   int *p = ptr.load(memory_order_consume);
>   size_t reveal_that_p_is_in_d = p - d[0];
>   return compute(*p);
> }
>
> Could be transformed to (after inlining compute(), and specializing for
> the likely path):
>
> int bar() {
>   int *p = ptr.load(memory_order_consume);
>   if (p == d) return 23;
>   else <lots of stuff(*p)>;
> }

Note that if a user writes

  if (p == d)
   {
     ... do lots of stuff via p ...
   }

GCC might rewrite accesses to p as accesses to d and thus expose
those opportunities.  Is that a transform that isn't valid then or is
the code written by the user (establishing the equivalency) to blame?

There's a PR where this kind of equivalencies lead to unexpected (wrong?)
points-to results for example.

> Other potential examples that come to mind are de-virtualization, or
> feedback-directed optimizations that has observed at runtime that a
> certain pointer is likely to be always equal to some other pointer (eg.,
> if p is almost always d[0], and specializing for that).

That's the cases that are quite important in practice.

> Also, it would be interesting to me to know how often we may turn data
> dependencies into control dependencies in cases where this doesn't
> affect performance significantly.

I suppose we try to avoid that but can we ever know for sure?  Like
speculative devirtualization does this (with the intent that it _does_ matter,
of course).

I suppose establishing non-dependence isn't an issue, like with the
vectorizer adding runtime dependence checks and applying versioning
to get a vectorized and a not vectorized loop (in case there are dependences)?

> The background for this question is Paul McKenney's recently updated
> proposal for a different memory_order_consume specification:
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf
>
> In a nutshell, this requires a compiler to either prove that a pointer
> value is not carrying a dependency (simplified, its value somehow
> originates from a memory_order_consume load), or it has to
> conservatively assume that it does; if it does, the compiler must not
> turn data dependencies into control dependencies in generated code.
> (The data dependencies, in contrast to control dependencies, enforce
> memory ordering on archs such as Power and ARM; these orderings than
> allow for not having to use an acquire HW barrier in the generated
> code.)
>
> Given that such a proof will likely be hard for a compiler (dependency
> chains propagate through assignments to variables on the heap and stack,
> chains are not marked in the code, and points-to analysis can be hard),
> a compiler faces a trade-off between either:
> (1) trying to support this memory_order_consume specification and likely
> disallowing all transformations that change data dependencies into
> control dependencies, or
> (2) not support the proposal by simply emitting memory_order_acquire
> code, but get no new constraints on transformations in return (ie, what
> we do for memory_order_consume today).
>
> A compiler could let users make this choice, but this will be hard for
> users too, and the compiler would still have to pick a default.
>
> Therefore, it would be good to know how important such transformations
> or optimizations are in practice.  If they are, it either means somewhat
> slower code everywhere (or at least having to change much in todays
> compilers), or not supporting the new memory_order_consume (at least not
> in the default setting of the compiler).

IMHO all the memory order semantics are too complicated already so what's
the reason to complicate it even more...?  See for example how our alias
analysis (which is supposed to also honor barriers) gives up completely
on atomics.

Richard.

> Thanks for any opinions.
>

Reply via email to