On 05/18/2015 02:01 PM, mark maule wrote:
I have a loop which hangs when compiled with -O2, but runs fine when compiled with -O1. Not sure what information is required to get an answer, so starting with the full src code. I have not attempted to reduce to a simpler test case yet.
Typically a standalone test case is required. Note that a difference in behavior between -O1 and -O2 doesn't necessarily imply there's a compiler bug. One or more of the -O2 optimizations could be exposing a bug in the program. This is true even if the program runs successfully when compiled with a prior version of the compiler since new optimizations are added existing ones improve between releases. Nevertheless, compiling the program with different versions of the compiler (or different compilers altogether) to see if the problem can be consistently reproduced can provide a useful data point.
Attachments: bs_destage.c - full source code bs_destage.dis.O2 - gdb disassembly of bs_destageLoop() bs_destage.dis+m.O2 - src annotated version of the above
I suspect this is too big for most people to analyze (it is for me). You will need to reduce it to something more manageable that can be compiled independently of the rest of your system (all the "..." include files).
The function in question is bs_destageSearch(). When I compile bs_destage.c with -O2, it seems that the dgHandle condition at line 741 is being ignored, leading to an infinite loop. I can see in the disassembly that dgHandle is still in the code as a 16-bit value stored at 0x32(%rsp), and a running 32-bit copy stored at 0x1c(%rsp). I can also see that the 16 bit version at 0x32(%rsp) is being incremented at the end of the loop, but I don't see anywhere in the code where either version of dgHandle is being used when determining if the while() at 741 should be continued. I'm not very familiar with the optimizations that are done in O2 vs O1, or even what happens in these optimizations. So, I'm wondering if this is a bug, or a subtle valid optimization that I don't understand. Any help would be appreciated.
It's hard to tell where the problem is or offer suggestions without having a more complete understanding of the objects and types used in the program. For example, the underlying type of dgHandle_t or the value of the BS_CFG_DRIVE_GROUPS constant the variable is being compared to(*). The way to expose these details is by creating a translation unit for the source file (i.e., preprocessing it with the -E option). From these details it should be possible to determine whether the code in the file is correct (and the likely cause of the problem is a compiler bug) or whether the problem is due to a bug in the code. The translation unit will be much bigger than the source file so you will need to do the initial analysis yourself before asking for more help (on gcc-help).
Note: changing the declaration of dgHandle to be volitile appears to modify the code sufficiently that it looks like the dgHandle check is honored (have not tested).
[*] For instance, if dgHandle_t was an alias for unsigned short and the value of BS_CFG_DRIVE_GROUPS was USHRT_MAX -1 then GCC could remove the corresponding test in the while loop (since it could never exceed the maximum value for its type) unless dgHandle was declared volatile.
Thanks in advance for any help/advice.
The gcc list is meant for discussions related to GCC development. The gcc-help list is the right forum for requests for help. Martin