Hi all,

I have tried to make the dependent_ptr qualification act as volatile during
the RTL passes to bypass the RTL optimizations for now. Here is the patch
https://github.com/AKG001/gcc/commit/14c05ae546554f822f667fdb72080b7fe52fea32
For this (https://github.com/AKG001/test/blob/master/dependency_breaking.c)
example, I have been able to get this (
https://github.com/AKG001/test/blob/master/dependency_breaking.c.317r.final)
.final code. I believe that there are no dependencies that are broken.
Hoping someone else could also confirm if I have mistaken somewhere. I have
given the dependent_ptr qualification to only memory type objects for now.
The flag /d represents that this memory represents the dependent pointer in
.final code.
I am checking up the other examples form document P0190R4 to see what other
things need to be done. Let me know what you guys think.

-Akshat

On Mon, Jul 22, 2019 at 3:16 PM Richard Biener <richard.guent...@gmail.com>
wrote:

> On Mon, Jul 22, 2019 at 10:54 AM Akshat Garg <xks...@gmail.com> wrote:
>
>> On Mon, Jul 22, 2019 at 2:11 PM Richard Biener <
>> richard.guent...@gmail.com> wrote:
>>
>>> On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg <xks...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>> Consider part of an example(figure 20) from doc P0190R4(
>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf)
>>>> shown below:
>>>>
>>>> 1.  void thread1 (void)
>>>> 2.  {
>>>> 3.    int * volatile p;
>>>> 4.    p = rcu_dereference(gip);
>>>> 5.    if (p)
>>>> 6.    assert(*(p+p[0]) == 42);
>>>> 7.  }
>>>> The .gimple code produced is :
>>>>
>>>> 1.  thread1 ()
>>>> 2.  {
>>>> 3.   atomic int * D.1992;
>>>> 4.    int * volatile p;
>>>> 5.  {
>>>> 6.    atomic int * * __atomic_load_ptr;
>>>> 7.   atomic int * __atomic_load_tmp;
>>>> 8.    try
>>>> 9.     {
>>>> 10.        __atomic_load_ptr = &gip;
>>>> 11.        _1 = __atomic_load_8 (__atomic_load_ptr, 1);
>>>> 12.        _2 = (atomic int *) _1;
>>>> 13.        __atomic_load_tmp = _2;
>>>> 14.        D.1992 = __atomic_load_tmp;
>>>> 15.     }
>>>> 16.    finally
>>>> 17.      {
>>>> 18.        __atomic_load_tmp = {CLOBBER};
>>>> 19.      }
>>>> 20.  }
>>>> 21.   p = D.1992;
>>>> 22.   p.2_3 = p;
>>>> 23.  if (p.2_3 != 0B) goto <D.1994>; else goto <D.1995>;
>>>> 24.  <D.1994>:
>>>> 25.   p.3_4 = p;
>>>> 26.  p.4_5 = p;
>>>> 27.   _6 = *p.4_5;
>>>> 28.  _7 = (long unsigned int) _6;
>>>> 29.  _8 = _7 * 4;
>>>> 30.  _9 = p.3_4 + _8;
>>>> 31.  _10 = *_9;
>>>> 32.  _11 = _10 == 42;
>>>> 33.  _12 = (int) _11;
>>>> 34.  assert (_12);
>>>> 35.  <D.1995>:
>>>> 36. }
>>>>
>>>> assert at line 34 in .gimple code still breaks the dependency given by
>>>> the user. I believe, there should be some ssa defined variable of p or p
>>>> itself in assert. This is happening when I am considering pointer as
>>>> volatile qualified. If I consider it as _Dependent_ptr qualified then it
>>>> surely produces the broken dependency code. Let me know, if I wrong
>>>> somewhere.
>>>>
>>>>
>>> p appears as memory here which we load its value to p.3_4 which we then
>>> offset by _8 and load from the
>>> computed address into _10 which then appears in the assert condition.  I
>>> think that's as good as it can
>>> get ...
>>>
>>> Richard.
>>>
>>
>> Thank you for your reply. For, the same example above, consider this (
>> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L402)
>> instruction at rtl level changed form this (
>> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L231)
>> during the cse pass. The variable p.2_3 gets replaced by a temporary _1 but
>> _1 is not any dependent pointer where, p.2_3 was. Is this also not breaking
>> any dependencies
>>
>
> I'm not sure.  In general CSE can break dependences.  If the dependent
> pointer chain needs to conver multiple levels of
> indirections from the original atomic operation you need to make sure to
> not expose atomics as CSEable.  Thus on
> RTL have them all UNSPECs.
>
> Richard.
>
>
>> -Akshat
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg <xks...@gmail.com> wrote:
>>>>
>>>>> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill <ja...@redhat.com> wrote:
>>>>>
>>>>>> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney <
>>>>>> paul...@linux.ibm.com> wrote:
>>>>>> >
>>>>>> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
>>>>>> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg <xks...@gmail.com>
>>>>>> wrote:
>>>>>> > >
>>>>>> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
>>>>>> > > > ramana....@googlemail.com> wrote:
>>>>>> > > >
>>>>>> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg <xks...@gmail.com>
>>>>>> wrote:
>>>>>> > > >> >
>>>>>> > > >> > As we have some working front-end code for _Dependent_ptr,
>>>>>> What should
>>>>>> > > >> we do next? What I understand, we can start adding the library
>>>>>> for
>>>>>> > > >> dependent_ptr and its functions for C corresponding to the
>>>>>> ones we created
>>>>>> > > >> as C++ template library. Then, after that, we can move on to
>>>>>> generating the
>>>>>> > > >> assembly code part.
>>>>>> > > >> >
>>>>>> > > >>
>>>>>> > > >>
>>>>>> > > >> I think the next step is figuring out how to model the
>>>>>> Dependent
>>>>>> > > >> pointer information in the IR and figuring out what
>>>>>> optimizations to
>>>>>> > > >> allow or not with that information. At this point , I suspect
>>>>>> we need
>>>>>> > > >> a plan on record and have the conversation upstream on the
>>>>>> lists.
>>>>>> > > >>
>>>>>> > > >> I think we need to put down a plan on record.
>>>>>> > > >>
>>>>>> > > >> Ramana
>>>>>> > > >
>>>>>> > > > [CCing gcc mailing list]
>>>>>> > > >
>>>>>> > > > So, shall I start looking over the pointer optimizations only
>>>>>> and see what
>>>>>> > > > information we may be needed on the same examples in the IR
>>>>>> itself?
>>>>>> > > >
>>>>>> > > > - Akshat
>>>>>> > > >
>>>>>> > > I have coded an example where equality comparison kills
>>>>>> dependency from the
>>>>>> > > document P0190R4 as shown below :
>>>>>> > >
>>>>>> > > 1. struct rcutest rt = {1, 2, 3};
>>>>>> > > 2. void thread0 ()
>>>>>> > > 3. {
>>>>>> > > 4.     rt.a = -42;
>>>>>> > > 5.     rt.b = -43;
>>>>>> > > 6.     rt.c = -44;
>>>>>> > > 7.     rcu_assign_pointer(gp, &rt);
>>>>>> > > 8. }
>>>>>> > > 9.
>>>>>> > > 10. void thread1 ()
>>>>>> > > 11. {
>>>>>> > > 12.    int i = -1;
>>>>>> > > 13.    int j = -1;
>>>>>> > > 14.    _Dependent_ptr struct rcutest *p;
>>>>>> > > 15.
>>>>>> > > 16.    p = rcu_dereference(gp);
>>>>>> > > 17.    j = p->a;
>>>>>> > > 18.   if (p == &rt)
>>>>>> > > 19.        i = p->b;  /*Dependency breaking point*/
>>>>>> > > 20.   else if(p)
>>>>>> > > 21.       i = p->c;
>>>>>> > > 22.   assert(i<0);
>>>>>> > > 23.   assert(j<0);
>>>>>> > > 24. }
>>>>>> > > The gimple unoptimized code produced for lines 17-24 is shown
>>>>>> below
>>>>>> > >
>>>>>> > > 1. if (p_16 == &rt)
>>>>>> > > 2.     goto <bb 3>; [INV]
>>>>>> > > 3.   else
>>>>>> > > 4.    goto <bb 4>; [INV]
>>>>>> > > 5.
>>>>>> > > 6.  <bb 3> :
>>>>>> > > 7.  i_19 = p_16->b;
>>>>>> > > 8.  goto <bb 6>; [INV]
>>>>>> > > 9.
>>>>>> > > 10.  <bb 4> :
>>>>>> > > 11.  if (p_16 != 0B)
>>>>>> > > 12.    goto <bb 5>; [INV]
>>>>>> > > 13.  else
>>>>>> > > 14.    goto <bb 6>; [INV]
>>>>>> > > 15.
>>>>>> > > 16.  <bb 5> :
>>>>>> > > 17.  i_18 = p_16->c;
>>>>>> > > 18.
>>>>>> > > 19.  <bb 6> :
>>>>>> > > 20.  # i_7 = PHI <i_19(3), i_8(4), i_18(5)>
>>>>>> > > 21.  _3 = i_7 < 0;
>>>>>> > > 22.  _4 = (int) _3;
>>>>>> > > 23.  assert (_4);
>>>>>> > > 24.  _5 = j_17 < 0;
>>>>>> > > 25.  _6 = (int) _5;
>>>>>> > > 26.  assert (_6);
>>>>>> > > 27.  return;
>>>>>> > >
>>>>>> > > The optimized code after -O1 is applied for the same lines is
>>>>>> hown below :
>>>>>> > >
>>>>>> > > 1. if (_2 == &rt)
>>>>>> > > 2.    goto <bb 3>; [30.00%]
>>>>>> > > 3. else
>>>>>> > > 4.    goto <bb 4>; [70.00%]
>>>>>> > > 5.
>>>>>> > > 6.  <bb 3> [local count: 322122547]:
>>>>>> > > 7.   i_12 = rt.b;
>>>>>> > > 8.   goto <bb 6>; [100.00%]
>>>>>> > > 9.
>>>>>> > > 10.  <bb 4> [local count: 751619277]:
>>>>>> > > 11.   if (_1 != 0)
>>>>>> > > 12.   goto <bb 5>; [50.00%]
>>>>>> > > 13.   else
>>>>>> > > 14.    goto <bb 6>; [50.00%]
>>>>>> > > 15.
>>>>>> > > 16.  <bb 5> [local count: 375809638]:
>>>>>> > > 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
>>>>>> > > 18.
>>>>>> > > 19.   <bb 6> [local count: 1073741824]:
>>>>>> > > 20.  # i_7 = PHI <i_12(3), i_11(5), -1(4)>
>>>>>> > > 21.   _3 = i_7 < 0;
>>>>>> > > 22.   _4 = (int) _3;
>>>>>> > > 23.   assert (_4);
>>>>>> > > 24.  _5 = j_10 < 0;
>>>>>> > > 25.  _6 = (int) _5;
>>>>>> > > 26.   assert (_6);
>>>>>> > > 27.   return;
>>>>>> >
>>>>>> > Good show on tracing this through!
>>>>>> >
>>>>>> > > Statement 19 in the program gets converted from  i_19 = p_16->b;
>>>>>> in line 7
>>>>>> > > in unoptimized code to i_12 = rt.b; in line 7 in optimized code
>>>>>> which
>>>>>> > > breaks the dependency chain. We need to figure out the pass that
>>>>>> does that
>>>>>> > > and put some handling code in there for the _dependent_ptr
>>>>>> qualified
>>>>>> > > pointers. Passing simply -fipa-pure-const,
>>>>>> -fguess-branch-probability or
>>>>>> > > any other option alone does not produce the optimized code that
>>>>>> breaks the
>>>>>> > > dependency. But applying -O1, i.e., allowing all the
>>>>>> optimizations does so.
>>>>>> > > As passes are applied in a certain order, we need to figure out
>>>>>> up to what
>>>>>> > > passes, the code remains same and after what pass the dependency
>>>>>> does not
>>>>>> > > holds. So, we need to check the translated code after every pass.
>>>>>> > >
>>>>>> > > Does this sounds like a workable plan for ? Let me know your
>>>>>> thoughts. If
>>>>>> > > this sounds good then, we can do this for all the optimizations
>>>>>> that may
>>>>>> > > kill the dependencies at somepoint.
>>>>>> >
>>>>>> > I don't know of a better plan.
>>>>>> >
>>>>>> > My usual question...  Is there some way to script the checking of
>>>>>> the
>>>>>> > translated code at the end of each pass?
>>>>>>
>>>>>> The usual way to check the output of an optimization pass is by
>>>>>> dumping the intermediate code at that point and matching the dump
>>>>>> against a regexp, as in the tree-ssa directories in the testsuite.
>>>>>> -fdump-tree-all will dump after all the gimple optimization passes,
>>>>>> and you can look through them until you find the breakage.
>>>>>>
>>>>>> Jason
>>>>>>
>>>>> Thanks Jason for the advice, I followed it.
>>>>>
>>>>> I coded an example shown in figure 26 from here (
>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf).
>>>>> The example:
>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c
>>>>> The .objsz1 code:
>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.027t.objsz1
>>>>> The .ccp1 code:
>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.028t.ccp1
>>>>> The .sra code:
>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.114t.sra
>>>>> The .dom2 code:
>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.116t.dom2
>>>>>
>>>>> Comparing the .objsz1 and .ccp1 file, the "struct p" gets removed
>>>>> after constant copy propagation (ccp) pass(
>>>>> https://github.com/gcc-mirror/gcc/blob/master/gcc/passes.def#L77). In
>>>>> .objsz1 file at line 53, the dependencies start with
>>>>> p_16 = _14;
>>>>> .......
>>>>> i_19 = p_16->b;
>>>>> ......
>>>>> as user specified and after that all the references are made through
>>>>> struct p or its variants. But in .ccp1 file at line 41, the "struct p" is
>>>>> replaced with variable _2, now, variable _2 starts the dependencies as we
>>>>> can see.
>>>>> _2 = (atomic struct rcutest *) _1;
>>>>> ...........
>>>>> i_19 = MEM[(struct rcutest *)_2].b;
>>>>> ..........
>>>>> I don't know whether it is okay to say at this point, the dependencies
>>>>> are still maintained or not. Or we have to pass the dependent pointer
>>>>> qualification to new variables that replaces them. Hoping someone could
>>>>> clarify.
>>>>>
>>>>> Also, In .sra file at line 43 in thread1(), the statement is:
>>>>>     i_12 = MEM[(struct rcutest *)_2].b;
>>>>> In .dom2 file at line 61, the above statement gets converted to:
>>>>>     i_12 = rt.b;
>>>>> So, I believe that sure does breaks the dependency specified by the
>>>>> user. For this, we need to do some modifications in tree-ssa-dom.c file.
>>>>> Hope this makes sense.
>>>>>
>>>>> Thanks,
>>>>> Akshat
>>>>>
>>>>

Reply via email to