Hi, all I am studying on what kind of information a compiler can pass to a binary translator (QEMU, for example) so that the binary translator can do much aggressive optimization. Previous discussion [1] gave an example on what I want to do. And in the end of the discussion, it showed that GCC is unable to maintain CFG until emitting the assembly.
Here I want to know if we can get a loop boundry from GCC, for example, address 0x0010 and 0x0020 are a loop start and end respectively. Currently, our binary translator associates a backward branch with a loop. I don't know if this simple heuristic can identify all loops, at least for binary generated by GCC. There are studies argued above heuristic might not work as expected because compiler might perfom hot-cold optimization and code repositioning which result in backward branches that are NOT loop-back branches. I also asked a similar question on the LLVM mailing list [2]. "What I want to do is to locate the range of a for-loop statement in a binary. For example, given a for-loop statement belows, for (stat1; stat2; stat3) { /* do something */ } Is it possible to get information about the range (binary address) of the above for-loop, say, 0x0100 - 0x0120." It concluded that various optimizations may reorder the code. And one possible way to disable those optimizations is to insert inline assembly symbol before and after the loop. Any comment appreciated. Regards, chenwj [1] http://www.mail-archive.com/gcc@gcc.gnu.org/msg58152.html [2] http://markmail.org/thread/a2ze4v7o4ez64xmd -- Wei-Ren Chen (陳韋任) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667