On Sat, Apr 13, 2013 at 4:23 AM, Vadim Girlin <vadimgir...@gmail.com> wrote: > On 04/12/2013 11:36 PM, Martin Andersson wrote: >> >> I have made some progress with this issue. >> >> Vadim, I did as you suggested and tried to mimic the output from the >> shader analyser >> tool. I used your patch as a base and then tried various ways to see >> what would work. >> After many tries (and lockups) I did managed to get the >> ext_transform_feedback/order >> test to pass. >> >> It is a very ugly patch but it should illustrate what the problem (and >> potential solution) is. >> >> Your test program fails however because explicit break statements do >> not work. It >> should be possible to use the same code for the explicit breaks as for >> the implicit >> loop break.The reason it does not is that I detect the implicit break >> with a hack and >> it does notwork for explicit breaks. >> >> The problem is that I need to detect the break statement when creating the >> corresponding if statement. So that I can treat it differently than >> other "regular" if >> statements. Anyone knows how I could do that, or is this the wrong >> approach? >> > > It doesn't work with my test app because IF/ENDIF blocks with BRK may > contain other code, so you can't simply throw away IF/ENDIF making that code > execute unconditionally.
Yeah my hack is not an viable option. > By the way, shader analyzer in some cases also produces the code with > JUMP/POP around PRED_SET-BREAK, though I'm not sure if that code will really > work as expected with catalyst. Possibly we're simply missing something in > the hardware configuration. > > Also there is one thing that I didn't take into account in my initial patch > - r600g converts ALU followed by POP to ALU_POP_AFTER and this might explain > why my initial patch doesn't work. Possibly if we prevent that optimization > for ALU containing PRED_SET-BREAK and leave separate POP, it might be enough > to make it work. I'm attaching the additional patch that will force POP to > be a separate instruction in this case, please test it (on top of the my > first patch). This would be at least not very intrusive. No, that patch did not help either. > If this won't help, then I think we should understand what exactly we are > trying to fix before implementing any big changes, possibly there is a > better solution or at least a more clean workaround. In the worst case we > can return to your approach and improve it to handle other cases. I'm starting to think that there is nothing wrong with the shader compiler. It seems to me that a push, pop inside a nested loop clears the break status on a thread. shift_reg = 1u; count = 0u; while (true) { if (x == 1u) break; while (true) { if (x != 1u) count = 10u; if (x == 1u) count = 20u; break; } shift_reg = 2u; break; } input: x == 0 actual ouput: shift_reg == 2, count == 10 expected output: shift_reg == 2, count == 10 input: x == 1 actual ouput: shift_reg == 2, count == 20 expected output: shift_reg == 1, count == 0 If I swap the if statements in the inner loop I get different results. shift_reg = 1u; count = 0u; while (true) { if (x == 1u) break; while (true) { if (x == 1u) count = 20u; if (x != 1u) count = 10u; break; } shift_reg = 2u; break; } input: x == 0 actual ouput: shift_reg == 2, count == 10 expected output: shift_reg == 2, count == 10 input: x == 1 actual ouput: shift_reg == 2, count == 0 expected output: shift_reg == 1, count == 0 I tested both cases on mesa master and mesa master + Vadims two patches with the same results. //Martin > Vadim > > >> //Martin >> >> On Thu, Apr 11, 2013 at 5:31 PM, Vadim Girlin <vadimgir...@gmail.com> >> wrote: >>> >>> On 04/11/2013 02:08 AM, Marek Olšák wrote: >>>> >>>> >>>> Here's the output: >>>> >>>> creating vs ... >>>> shader compilation status: OK >>>> creating fs ... >>>> shader compilation status: OK >>>> thread #0 (0;0) : ref = 16608 >>>> thread #1 (1;0) : ref = 27873 >>>> thread #2 (0;1) : ref = 16608 >>>> thread #3 (1;1) : ref = 27877 >>>> results: >>>> thread 0 (0, 0): expected = 16608, observed = 27876, FAIL >>>> thread 1 (1, 0): expected = 27873, observed = 27873, OK >>>> thread 2 (0, 1): expected = 16608, observed = 27876, FAIL >>>> thread 3 (1, 1): expected = 27877, observed = 27877, OK >>>> >>> >>> Thanks. According to these results, it looks like LOOP_START_DX10 for >>> inner >>> loop somehow reactivates the threads that were put into inactive-break >>> state >>> by the LOOP_BREAK in the outer loop. Also it seems LOOP_BREAK in the >>> inner >>> loop doesn't work as expected in this case. In other words, it looks >>> weird. >>> >>> I can't explain why would this happen. It might be interesting to run >>> these >>> tests with llvm backend to see if there are any differences. >>> >>> Probably it might help if we'll implement LOOP_BREAK via EXECUTE_MASK_OP >>> in >>> the PRED_SET encoding as in my earlier patch, but without any stack >>> push/pop >>> operations and jumps (where it's possible), closer to what the catalyst >>> (shader analyzer) does. I'm not sure if it will help though, and anyway >>> we'll need stack operations in some cases, so I'm afraid this won't fix >>> the >>> issue completely. >>> >>> So far I have no other ideas. >>> >>> Vadim >>> >>>> Marek >>>> >>>> >>>> On Wed, Apr 10, 2013 at 11:42 PM, Vadim Girlin >>>> <vadimgir...@gmail.com>wrote: >>>> >>>>> On 04/10/2013 01:53 PM, Marek Olšák wrote: >>>>> >>>>>> glsl-fs-loop-nested passes here. >>>>>> >>>>>> nstack is 3 and adding 4 to it doesn't help. >>>>>> >>>>> >>>>> Ok, thanks. >>>>> >>>>> Also I wrote a simple test app that should reproduce the issue if it's >>>>> really related to diverging control flow with nested loops and might >>>>> more >>>>> information about what's going wrong. >>>>> >>>>> The source is in the attachment and needs to be compiled with -lGL >>>>> -lglut >>>>> -lGLEW. The app renders four points and computes some value for each >>>>> point >>>>> in the loops similar to the transform feedback order test, but it >>>>> doesn't >>>>> use tfb. It should render four green or red squares depending on >>>>> correctness of the result. >>>>> >>>>> Here is the correct output produced for me on evergreen: >>>>> >>>>> thread 0 (0, 0): expected = 16608, observed = 16608, OK >>>>> thread 1 (1, 0): expected = 27873, observed = 27873, OK >>>>> thread 2 (0, 1): expected = 16608, observed = 16608, OK >>>>> thread 3 (1, 1): expected = 27877, observed = 27877, OK >>>>> >>>>> Please post the output if it fails on cayman. >>>>> >>>>> Vadim >>>>> >>>>> >>>>> >>>>>> Marek >>>>>> >>>>>> >>>>>> On Wed, Apr 10, 2013 at 8:46 AM, Vadim Girlin <vadimgir...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> On 04/10/2013 03:58 AM, Marek Olšák wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Vadim, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> your patch does not fix the test. >>>>>>>> >>>>>>>> >>>>>>> Hmm, I'm out of ideas then. Thanks for testing. >>>>>>> >>>>>>> I've checked the shader dump few times but I don't see anything >>>>>>> obviously >>>>>>> wrong there, and the same code (except the minor ALU grouping changes >>>>>>> due >>>>>>> to the VLIW4/VLIW5 difference) works fine for me on evergreen. >>>>>>> >>>>>>> According to the Martin's observations it looks like if the threads >>>>>>> that >>>>>>> shouldn't execute the loop body were incorrectly left in the active >>>>>>> state. >>>>>>> LOOP_BREAK should put them into the inactive-break state, but >>>>>>> something >>>>>>> goes wrong. Do the other piglit tests with nested loops (e.g. >>>>>>> glsl-fs-loop-nested) work on cayman? Though possibly there are no >>>>>>> other >>>>>>> tests with the diverging loops as in this case. >>>>>>> >>>>>>> I'll try to write a simpler test with the diverging loops to see if >>>>>>> the >>>>>>> issue is really caused by the incorrect control flow handling, and to >>>>>>> figure out the exact instruction that results in the incorrect active >>>>>>> state. >>>>>>> >>>>>>> Also probably it worth checking if the stack size is correct for that >>>>>>> shader (latest mesa should print nstack value in the shader >>>>>>> disassemble >>>>>>> header, I think it should be 3 for that shader) and maybe try adding >>>>>>> some >>>>>>> constant, e.g. 4 to the bc->nstack in the r600_bytecode_build just to >>>>>>> be >>>>>>> sure that we reserve enough of stack space, though I don't think >>>>>>> stack >>>>>>> size >>>>>>> is the cause of this issue. >>>>>>> >>>>>>> Vadim >>>>>>> >>>>>>> >>>>>>> >>>>>>> Marek >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Apr 9, 2013 at 11:30 PM, Vadim Girlin >>>>>>>> <vadimgir...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 04/09/2013 10:58 AM, Martin Andersson wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Apr 9, 2013 at 3:18 AM, Marek Olšák <mar...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Pushed, thanks. The transform feedback test still doesn't >>>>>>>>>> pass, >>>>>>>>>> but >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>>> least >>>>>>>>>>> the hardlocks are gone. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, I have looked into the other issue as well >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://lists.freedesktop.org/******archives/mesa-dev/2013-**March/**<http://lists.freedesktop.org/****archives/mesa-dev/2013-March/**> >>>>>>>>>> **036941.html<http://lists.**freedesktop.org/**archives/** >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> mesa-dev/2013-March/**036941.**html<http://lists.freedesktop.org/**archives/mesa-dev/2013-March/**036941.html> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <http://lists.**freedesktop.**org/archives/mesa-**<http://freedesktop.org/archives/mesa-**> >>>>>>>>>> dev/2013-March/036941.html<htt**p://lists.freedesktop.org/** >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> archives/mesa-dev/2013-March/**036941.html<http://lists.freedesktop.org/archives/mesa-dev/2013-March/036941.html> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The problem arises when there are nested loops. If I rework the >>>>>>>>>> code >>>>>>>>>> so there are >>>>>>>>>> no nested loops the issue disappears. At least one pixel also >>>>>>>>>> needs >>>>>>>>>> to >>>>>>>>>> enter the >>>>>>>>>> outer loop. The pixels that should enter the outer loop behaves >>>>>>>>>> correctly. It is those >>>>>>>>>> pixels that should not enter the outer loop that misbehaves. It >>>>>>>>>> does >>>>>>>>>> not matter if they >>>>>>>>>> also fails the test for the inner loop, they will still execute >>>>>>>>>> the >>>>>>>>>> instruction inside. That >>>>>>>>>> leads to the strange results for that test. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Please test the attached patch. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Vadim >>>>>>>>> >>>>>>>>> >>>>>>>>> The strangeness is easier to see if the NUM_POINTS in the >>>>>>>>> >>>>>>>>>> ext_transform_feedback/ >>>>>>>>>> order.c are run with smaller values,like 3, 6 and 9. Disable the >>>>>>>>>> code >>>>>>>>>> that fail the test >>>>>>>>>> and print starting_x, shift_reg_final and iteration_count. >>>>>>>>>> >>>>>>>>>> Marek, since you implemented transform feedback for r600, do you >>>>>>>>>> think >>>>>>>>>> the issue >>>>>>>>>> is with the tranform feedback code or the shader compiler or some >>>>>>>>>> other >>>>>>>>>> thing? >>>>>>>>>> >>>>>>>>>> //Martin >>>>>>>>>> ______________________________******_________________ >>>>>>>>>> mesa-dev mailing list >>>>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://lists.freedesktop.org/******mailman/listinfo/mesa-dev<http://lists.freedesktop.org/****mailman/listinfo/mesa-dev> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <h**ttp://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <htt**p://lists.freedesktop.**org/**mailman/listinfo/mesa-**dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <http://lists.freedesktop.**org/mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ______________________________****_________________ >>>>>>>>> >>>>>>>>> >>>>>>>>> mesa-dev mailing list >>>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>>> >>>>>>>>> >>>>>>>>> http://lists.freedesktop.org/****mailman/listinfo/mesa-dev<http://lists.freedesktop.org/**mailman/listinfo/mesa-dev> >>>>>>>>> >>>>>>>>> >>>>>>>>> <htt**p://lists.freedesktop.org/**mailman/listinfo/mesa-dev<http://lists.freedesktop.org/mailman/listinfo/mesa-dev> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev