Hi Nick,

I wonder which optimization flag you are using while compiling your
program? My guess is the the behavior you are observing is because the
compiler is able to figure out that the x is a constant number that can be
determined statically and the binary it is generating in both cases
probably does not do anything (as there is nothing else in the loop as
well). As a result, you don't see any difference in cycle count.

As far as your blog post is concerned, please note that the format block
defines how your instruction will be executed in the simulation. Basically,
the value of `i' will not have any impact on the latency of the
instruction. To change the latency you will have to separately change the
latency of that opClass of the instruction you have added. For example, for
O3CPU you can have a look at the latencies of different instructions:

https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/src/cpu/o3/FuncUnitConfig.py

-Ayaz

On Tue, Jul 11, 2023 at 6:16 PM Nick F via gem5-users <gem5-users@gem5.org>
wrote:

> Good afternoon,
>
> I have been trying to use Gem5 to research and study the performance of
> several different computer architectures. However, I have been noticing
> that I may be unable to accurately model the differences in cycle length
> for computer programs.
>
> Take for example these two programs:
>
> #include <stdint.h>
>
> int main(void)
> {
>     for (uint32_t i = 0; i < 1000; i++) {
>         uint32_t x = 5 * 6;
>         if (x != 30) {
>             return 1;
>         }
>     }
>     return 0;
> }
>
> #include <stdint.h>
>
> int main(void)
> {
>     for (uint32_t i = 0; i < 1000; i++) {
>         uint32_t x = 5 + 6;
>         if (x != 11) {
>             return 1;
>         }
>     }
>     return 0;
> }
>
> Compiling and running both individually on a basic RISC-V CPU config, they
> both exit at exactly 1,297,721,000. However, in a real system, each
> multiply operation would take longer and I'd suspect doing 1000
> multiplications would have even a tiny difference in performance. My own
> research would also have difficulties analyzing relative performance unless
> I'm missing something.
>
> Even custom instructions seem to execute in a single CPU cycle regardless
> of how the hardware would be implemented.
>
> Is there a good way to define cycle delays in my Gem5 environment? I can
> implement a "multiply" function inserts a bunch of no-ops, but that would
> make it more complicated when the program complexity grows.
>
> I've written a small blog post
> <https://fleker.medium.com/modeling-memristors-to-execute-physically-accurate-imply-operations-in-gem5-ef888b7dc49b>
> exploring some of what I've tried in the past week. If anyone here has any
> suggestions I'd be interested to hear them.
>
> Thanks,
>
> Nick
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to