Hi Nick, I wonder which optimization flag you are using while compiling your program? My guess is the the behavior you are observing is because the compiler is able to figure out that the x is a constant number that can be determined statically and the binary it is generating in both cases probably does not do anything (as there is nothing else in the loop as well). As a result, you don't see any difference in cycle count.
As far as your blog post is concerned, please note that the format block defines how your instruction will be executed in the simulation. Basically, the value of `i' will not have any impact on the latency of the instruction. To change the latency you will have to separately change the latency of that opClass of the instruction you have added. For example, for O3CPU you can have a look at the latencies of different instructions: https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/cpu/o3/FuncUnitConfig.py -Ayaz On Tue, Jul 11, 2023 at 6:16 PM Nick F via gem5-users <gem5-users@gem5.org> wrote: > Good afternoon, > > I have been trying to use Gem5 to research and study the performance of > several different computer architectures. However, I have been noticing > that I may be unable to accurately model the differences in cycle length > for computer programs. > > Take for example these two programs: > > #include <stdint.h> > > int main(void) > { > for (uint32_t i = 0; i < 1000; i++) { > uint32_t x = 5 * 6; > if (x != 30) { > return 1; > } > } > return 0; > } > > #include <stdint.h> > > int main(void) > { > for (uint32_t i = 0; i < 1000; i++) { > uint32_t x = 5 + 6; > if (x != 11) { > return 1; > } > } > return 0; > } > > Compiling and running both individually on a basic RISC-V CPU config, they > both exit at exactly 1,297,721,000. However, in a real system, each > multiply operation would take longer and I'd suspect doing 1000 > multiplications would have even a tiny difference in performance. My own > research would also have difficulties analyzing relative performance unless > I'm missing something. > > Even custom instructions seem to execute in a single CPU cycle regardless > of how the hardware would be implemented. > > Is there a good way to define cycle delays in my Gem5 environment? I can > implement a "multiply" function inserts a bunch of no-ops, but that would > make it more complicated when the program complexity grows. > > I've written a small blog post > <https://fleker.medium.com/modeling-memristors-to-execute-physically-accurate-imply-operations-in-gem5-ef888b7dc49b> > exploring some of what I've tried in the past week. If anyone here has any > suggestions I'd be interested to hear them. > > Thanks, > > Nick > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org >
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org