On Wed, Jul 24, 2019 at 1:41 AM Akshat Garg <xks...@gmail.com> wrote:
> Hi all, > > I have tried to make the dependent_ptr qualification act as volatile > during the RTL passes to bypass the RTL optimizations for now. Here is the > patch > https://github.com/AKG001/gcc/commit/14c05ae546554f822f667fdb72080b7fe52fea32 > For this (https://github.com/AKG001/test/blob/master/dependency_breaking.c) > example, I have been able to get this ( > https://github.com/AKG001/test/blob/master/dependency_breaking.c.317r.final) > .final code. I believe that there are no dependencies that are broken. > Hoping someone else could also confirm if I have mistaken somewhere. I have > given the dependent_ptr qualification to only memory type objects for now. > The flag /d represents that this memory represents the dependent pointer in > .final code. > I am checking up the other examples form document P0190R4 to see what > other things need to be done. Let me know what you guys think. > Sorry, I forgot to give the .optimized code for the above example. Here, it is https://github.com/AKG001/test/blob/master/dependency_breaking.c.231t.optimize <https://github.com/AKG001/test/blob/master/dependency_breaking.c.231t.optimized> May be needed. > > -Akshat > > On Mon, Jul 22, 2019 at 3:16 PM Richard Biener <richard.guent...@gmail.com> > wrote: > >> On Mon, Jul 22, 2019 at 10:54 AM Akshat Garg <xks...@gmail.com> wrote: >> >>> On Mon, Jul 22, 2019 at 2:11 PM Richard Biener < >>> richard.guent...@gmail.com> wrote: >>> >>>> On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg <xks...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> Consider part of an example(figure 20) from doc P0190R4( >>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf) >>>>> shown below: >>>>> >>>>> 1. void thread1 (void) >>>>> 2. { >>>>> 3. int * volatile p; >>>>> 4. p = rcu_dereference(gip); >>>>> 5. if (p) >>>>> 6. assert(*(p+p[0]) == 42); >>>>> 7. } >>>>> The .gimple code produced is : >>>>> >>>>> 1. thread1 () >>>>> 2. { >>>>> 3. atomic int * D.1992; >>>>> 4. int * volatile p; >>>>> 5. { >>>>> 6. atomic int * * __atomic_load_ptr; >>>>> 7. atomic int * __atomic_load_tmp; >>>>> 8. try >>>>> 9. { >>>>> 10. __atomic_load_ptr = &gip; >>>>> 11. _1 = __atomic_load_8 (__atomic_load_ptr, 1); >>>>> 12. _2 = (atomic int *) _1; >>>>> 13. __atomic_load_tmp = _2; >>>>> 14. D.1992 = __atomic_load_tmp; >>>>> 15. } >>>>> 16. finally >>>>> 17. { >>>>> 18. __atomic_load_tmp = {CLOBBER}; >>>>> 19. } >>>>> 20. } >>>>> 21. p = D.1992; >>>>> 22. p.2_3 = p; >>>>> 23. if (p.2_3 != 0B) goto <D.1994>; else goto <D.1995>; >>>>> 24. <D.1994>: >>>>> 25. p.3_4 = p; >>>>> 26. p.4_5 = p; >>>>> 27. _6 = *p.4_5; >>>>> 28. _7 = (long unsigned int) _6; >>>>> 29. _8 = _7 * 4; >>>>> 30. _9 = p.3_4 + _8; >>>>> 31. _10 = *_9; >>>>> 32. _11 = _10 == 42; >>>>> 33. _12 = (int) _11; >>>>> 34. assert (_12); >>>>> 35. <D.1995>: >>>>> 36. } >>>>> >>>>> assert at line 34 in .gimple code still breaks the dependency given by >>>>> the user. I believe, there should be some ssa defined variable of p or p >>>>> itself in assert. This is happening when I am considering pointer as >>>>> volatile qualified. If I consider it as _Dependent_ptr qualified then it >>>>> surely produces the broken dependency code. Let me know, if I wrong >>>>> somewhere. >>>>> >>>>> >>>> p appears as memory here which we load its value to p.3_4 which we then >>>> offset by _8 and load from the >>>> computed address into _10 which then appears in the assert condition. >>>> I think that's as good as it can >>>> get ... >>>> >>>> Richard. >>>> >>> >>> Thank you for your reply. For, the same example above, consider this ( >>> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L402) >>> instruction at rtl level changed form this ( >>> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L231) >>> during the cse pass. The variable p.2_3 gets replaced by a temporary _1 but >>> _1 is not any dependent pointer where, p.2_3 was. Is this also not breaking >>> any dependencies >>> >> >> I'm not sure. In general CSE can break dependences. If the dependent >> pointer chain needs to conver multiple levels of >> indirections from the original atomic operation you need to make sure to >> not expose atomics as CSEable. Thus on >> RTL have them all UNSPECs. >> >> Richard. >> >> >>> -Akshat >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg <xks...@gmail.com> wrote: >>>>> >>>>>> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill <ja...@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney < >>>>>>> paul...@linux.ibm.com> wrote: >>>>>>> > >>>>>>> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote: >>>>>>> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg <xks...@gmail.com> >>>>>>> wrote: >>>>>>> > > >>>>>>> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan < >>>>>>> > > > ramana....@googlemail.com> wrote: >>>>>>> > > > >>>>>>> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg < >>>>>>> xks...@gmail.com> wrote: >>>>>>> > > >> > >>>>>>> > > >> > As we have some working front-end code for _Dependent_ptr, >>>>>>> What should >>>>>>> > > >> we do next? What I understand, we can start adding the >>>>>>> library for >>>>>>> > > >> dependent_ptr and its functions for C corresponding to the >>>>>>> ones we created >>>>>>> > > >> as C++ template library. Then, after that, we can move on to >>>>>>> generating the >>>>>>> > > >> assembly code part. >>>>>>> > > >> > >>>>>>> > > >> >>>>>>> > > >> >>>>>>> > > >> I think the next step is figuring out how to model the >>>>>>> Dependent >>>>>>> > > >> pointer information in the IR and figuring out what >>>>>>> optimizations to >>>>>>> > > >> allow or not with that information. At this point , I suspect >>>>>>> we need >>>>>>> > > >> a plan on record and have the conversation upstream on the >>>>>>> lists. >>>>>>> > > >> >>>>>>> > > >> I think we need to put down a plan on record. >>>>>>> > > >> >>>>>>> > > >> Ramana >>>>>>> > > > >>>>>>> > > > [CCing gcc mailing list] >>>>>>> > > > >>>>>>> > > > So, shall I start looking over the pointer optimizations only >>>>>>> and see what >>>>>>> > > > information we may be needed on the same examples in the IR >>>>>>> itself? >>>>>>> > > > >>>>>>> > > > - Akshat >>>>>>> > > > >>>>>>> > > I have coded an example where equality comparison kills >>>>>>> dependency from the >>>>>>> > > document P0190R4 as shown below : >>>>>>> > > >>>>>>> > > 1. struct rcutest rt = {1, 2, 3}; >>>>>>> > > 2. void thread0 () >>>>>>> > > 3. { >>>>>>> > > 4. rt.a = -42; >>>>>>> > > 5. rt.b = -43; >>>>>>> > > 6. rt.c = -44; >>>>>>> > > 7. rcu_assign_pointer(gp, &rt); >>>>>>> > > 8. } >>>>>>> > > 9. >>>>>>> > > 10. void thread1 () >>>>>>> > > 11. { >>>>>>> > > 12. int i = -1; >>>>>>> > > 13. int j = -1; >>>>>>> > > 14. _Dependent_ptr struct rcutest *p; >>>>>>> > > 15. >>>>>>> > > 16. p = rcu_dereference(gp); >>>>>>> > > 17. j = p->a; >>>>>>> > > 18. if (p == &rt) >>>>>>> > > 19. i = p->b; /*Dependency breaking point*/ >>>>>>> > > 20. else if(p) >>>>>>> > > 21. i = p->c; >>>>>>> > > 22. assert(i<0); >>>>>>> > > 23. assert(j<0); >>>>>>> > > 24. } >>>>>>> > > The gimple unoptimized code produced for lines 17-24 is shown >>>>>>> below >>>>>>> > > >>>>>>> > > 1. if (p_16 == &rt) >>>>>>> > > 2. goto <bb 3>; [INV] >>>>>>> > > 3. else >>>>>>> > > 4. goto <bb 4>; [INV] >>>>>>> > > 5. >>>>>>> > > 6. <bb 3> : >>>>>>> > > 7. i_19 = p_16->b; >>>>>>> > > 8. goto <bb 6>; [INV] >>>>>>> > > 9. >>>>>>> > > 10. <bb 4> : >>>>>>> > > 11. if (p_16 != 0B) >>>>>>> > > 12. goto <bb 5>; [INV] >>>>>>> > > 13. else >>>>>>> > > 14. goto <bb 6>; [INV] >>>>>>> > > 15. >>>>>>> > > 16. <bb 5> : >>>>>>> > > 17. i_18 = p_16->c; >>>>>>> > > 18. >>>>>>> > > 19. <bb 6> : >>>>>>> > > 20. # i_7 = PHI <i_19(3), i_8(4), i_18(5)> >>>>>>> > > 21. _3 = i_7 < 0; >>>>>>> > > 22. _4 = (int) _3; >>>>>>> > > 23. assert (_4); >>>>>>> > > 24. _5 = j_17 < 0; >>>>>>> > > 25. _6 = (int) _5; >>>>>>> > > 26. assert (_6); >>>>>>> > > 27. return; >>>>>>> > > >>>>>>> > > The optimized code after -O1 is applied for the same lines is >>>>>>> hown below : >>>>>>> > > >>>>>>> > > 1. if (_2 == &rt) >>>>>>> > > 2. goto <bb 3>; [30.00%] >>>>>>> > > 3. else >>>>>>> > > 4. goto <bb 4>; [70.00%] >>>>>>> > > 5. >>>>>>> > > 6. <bb 3> [local count: 322122547]: >>>>>>> > > 7. i_12 = rt.b; >>>>>>> > > 8. goto <bb 6>; [100.00%] >>>>>>> > > 9. >>>>>>> > > 10. <bb 4> [local count: 751619277]: >>>>>>> > > 11. if (_1 != 0) >>>>>>> > > 12. goto <bb 5>; [50.00%] >>>>>>> > > 13. else >>>>>>> > > 14. goto <bb 6>; [50.00%] >>>>>>> > > 15. >>>>>>> > > 16. <bb 5> [local count: 375809638]: >>>>>>> > > 17. i_11 = MEM[(dependent_ptr struct rcutest *)_2].c; >>>>>>> > > 18. >>>>>>> > > 19. <bb 6> [local count: 1073741824]: >>>>>>> > > 20. # i_7 = PHI <i_12(3), i_11(5), -1(4)> >>>>>>> > > 21. _3 = i_7 < 0; >>>>>>> > > 22. _4 = (int) _3; >>>>>>> > > 23. assert (_4); >>>>>>> > > 24. _5 = j_10 < 0; >>>>>>> > > 25. _6 = (int) _5; >>>>>>> > > 26. assert (_6); >>>>>>> > > 27. return; >>>>>>> > >>>>>>> > Good show on tracing this through! >>>>>>> > >>>>>>> > > Statement 19 in the program gets converted from i_19 = p_16->b; >>>>>>> in line 7 >>>>>>> > > in unoptimized code to i_12 = rt.b; in line 7 in optimized code >>>>>>> which >>>>>>> > > breaks the dependency chain. We need to figure out the pass that >>>>>>> does that >>>>>>> > > and put some handling code in there for the _dependent_ptr >>>>>>> qualified >>>>>>> > > pointers. Passing simply -fipa-pure-const, >>>>>>> -fguess-branch-probability or >>>>>>> > > any other option alone does not produce the optimized code that >>>>>>> breaks the >>>>>>> > > dependency. But applying -O1, i.e., allowing all the >>>>>>> optimizations does so. >>>>>>> > > As passes are applied in a certain order, we need to figure out >>>>>>> up to what >>>>>>> > > passes, the code remains same and after what pass the dependency >>>>>>> does not >>>>>>> > > holds. So, we need to check the translated code after every pass. >>>>>>> > > >>>>>>> > > Does this sounds like a workable plan for ? Let me know your >>>>>>> thoughts. If >>>>>>> > > this sounds good then, we can do this for all the optimizations >>>>>>> that may >>>>>>> > > kill the dependencies at somepoint. >>>>>>> > >>>>>>> > I don't know of a better plan. >>>>>>> > >>>>>>> > My usual question... Is there some way to script the checking of >>>>>>> the >>>>>>> > translated code at the end of each pass? >>>>>>> >>>>>>> The usual way to check the output of an optimization pass is by >>>>>>> dumping the intermediate code at that point and matching the dump >>>>>>> against a regexp, as in the tree-ssa directories in the testsuite. >>>>>>> -fdump-tree-all will dump after all the gimple optimization passes, >>>>>>> and you can look through them until you find the breakage. >>>>>>> >>>>>>> Jason >>>>>>> >>>>>> Thanks Jason for the advice, I followed it. >>>>>> >>>>>> I coded an example shown in figure 26 from here ( >>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf). >>>>>> The example: >>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c >>>>>> The .objsz1 code: >>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.027t.objsz1 >>>>>> The .ccp1 code: >>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.028t.ccp1 >>>>>> The .sra code: >>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.114t.sra >>>>>> The .dom2 code: >>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.116t.dom2 >>>>>> >>>>>> Comparing the .objsz1 and .ccp1 file, the "struct p" gets removed >>>>>> after constant copy propagation (ccp) pass( >>>>>> https://github.com/gcc-mirror/gcc/blob/master/gcc/passes.def#L77). >>>>>> In .objsz1 file at line 53, the dependencies start with >>>>>> p_16 = _14; >>>>>> ....... >>>>>> i_19 = p_16->b; >>>>>> ...... >>>>>> as user specified and after that all the references are made through >>>>>> struct p or its variants. But in .ccp1 file at line 41, the "struct p" is >>>>>> replaced with variable _2, now, variable _2 starts the dependencies as we >>>>>> can see. >>>>>> _2 = (atomic struct rcutest *) _1; >>>>>> ........... >>>>>> i_19 = MEM[(struct rcutest *)_2].b; >>>>>> .......... >>>>>> I don't know whether it is okay to say at this point, the >>>>>> dependencies are still maintained or not. Or we have to pass the >>>>>> dependent >>>>>> pointer qualification to new variables that replaces them. Hoping someone >>>>>> could clarify. >>>>>> >>>>>> Also, In .sra file at line 43 in thread1(), the statement is: >>>>>> i_12 = MEM[(struct rcutest *)_2].b; >>>>>> In .dom2 file at line 61, the above statement gets converted to: >>>>>> i_12 = rt.b; >>>>>> So, I believe that sure does breaks the dependency specified by the >>>>>> user. For this, we need to do some modifications in tree-ssa-dom.c file. >>>>>> Hope this makes sense. >>>>>> >>>>>> Thanks, >>>>>> Akshat >>>>>> >>>>>