Hi,
Make sure your system has enough MSHRs, out of the box, L1, and L2 are set
to have a few MSHR entries.
Also, stride prefetcher is not the best, you may try something better: DCPT
gives me better numbers.

On Fri, Apr 15, 2022 at 4:57 AM Zicong Wang via gem5-users <
gem5-users@gem5.org> wrote:

> Hi Jason,
>
>   We are testing the memory bandwidth program STREAM *​*(
> https://www.cs.virginia.edu/stream/)​, but the results show that the CPU
> cannot fully utilize the DDR bandwidth, and the achieved bandwidth is quite
> low and about 1/10 of the peak bandwidth (peakBW in stats.txt). I tested
> the STREAM binary on my x86 computer and got the near peak bandwidth, so I
> believe the program is ok.
>
>   I've seen the maillist dialogue
> https://www.mail-archive.com/gem5-users@gem5.org/msg12965.html, and I
> think I've met the similar problem. So I tried the suggestions proposed by
> ​Andreas, including *​enable l1/l2 prefetcher*, *​**​**​**​​**​​​​using
> ARM detailed CPU*. Although these methods can improve the bandwidth, the
> results show it has limited effect. Besides, I've also tested the STREAM
> program in FS mode with x86 O3/Minor/TimingSimple CPU, and tested it in SE
> mode with ruby option, but all the results are similar and there is no
> essential difference.
>
>   I guess it is a general problem in simulation with gem5. I'm wondering
> if the result is expected or is there something wrong with the system
> model?
>
>   Two of the experimental results are attached for reference:
>
> *1. **X86 O3CPU, SE-mode, w/o l2 prefetcher:*
>
> ./build/X86/gem5.opt --outdir=m5out-stream configs/example/se.py
> --cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache
> --l2_size=8MB --mem-type=DDR3_1600_8x8 -c ../stream/stream
>
> *STREAM output:*​
> -------------------------------------------------------------
>
> Function    Best Rate MB/s     Avg time     Min time     Max time
> Copy:            1099.0     0.014559     0.014559     0.014559
> Scale:           1089.7     0.014683     0.014683     0.014683
> Add:             1213.0     0.019786     0.019786     0.019786
> Triad:           1222.1     0.019639     0.019639     0.019639
> -------------------------------------------------------------
>
> *stats.txt (dram related):*
>
> system.mem_ctrls.dram.bytesRead          238807808   # Total bytes read
> (Byte)
> system.mem_ctrls.dram.bytesWritten       121179776   # Total bytes written
> (Byte)
> system.mem_ctrls.dram.avgRdBW           718.689026   # Average DRAM read
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.avgWrBW           364.688977   # Average DRAM write
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.peakBW              12800.00   # Theoretical peak
> bandwidth in MiByte/s ((Byte/Second))
> system.mem_ctrls.dram.busUtil                 8.46   # Data bus
> utilization in percentage (Ratio)
> system.mem_ctrls.dram.busUtilRead             5.61   # Data bus
> utilization in percentage for reads (Ratio)
> system.mem_ctrls.dram.busUtilWrite            2.85   # Data bus
> utilization in percentage for writes (Ratio)
> system.mem_ctrls.dram.pageHitRate            40.57   # Row buffer hit
> rate, read and write combined (Ratio)
>
>
> *2**. X86 O3CPU, SE**-mode, w/* *l2 prefetcher:*
>
> ​./build/X86/gem5.opt --outdir=m5out-stream-l2hwp configs/example/se.py
> --cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache
> --l2_size=8MB --l2-hwp-typ=StridePrefetcher --mem-type=DDR3_1600_8x8 -c
> ../stream/stream
>
> *STREAM output:*​
> -------------------------------------------------------------
> Function    Best Rate MB/s     Avg time     Min time     Max time
> Copy:            1703.9     0.009390     0.009390     0.009390
> Scale:           1718.6     0.009310     0.009310     0.009310
> Add:             2087.3     0.011498     0.011498     0.011498
> Triad:           2227.2     0.010776     0.010776     0.010776
> -------------------------------------------------------------
>
> *stats.txt (dram related):*
>
> system.mem_ctrls.dram.bytesRead          238811712   # Total bytes read
> (Byte)
> system.mem_ctrls.dram.bytesWritten       121179840   # Total bytes written
> (Byte)
> system.mem_ctrls.dram.avgRdBW          1014.129912   # Average DRAM read
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.avgWrBW           514.598298   # Average DRAM write
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.peakBW              12800.00   # Theoretical peak
> bandwidth in MiByte/s ((Byte/Second))
> system.mem_ctrls.dram.busUtil                11.94   # Data bus
> utilization in percentage (Ratio)
> system.mem_ctrls.dram.busUtilRead             7.92   # Data bus
> utilization in percentage for reads (Ratio)
> system.mem_ctrls.dram.busUtilWrite            4.02   # Data bus
> utilization in percentage for writes (Ratio)
> system.mem_ctrls.dram.pageHitRate            75.37   # Row buffer hit
> rate, read and write combined (Ratio)
>
>
> *STREAM compil**ing options:*
>
> gcc -O2 -static -DSTREAM_ARRAY_SIZE=1000000 -DNTIMES=2 stream.c -o stream​
>
> All the experiments are performed on the latest stable version
> (141cc37c2d4b93959d4c249b8f7e6a8b2ef75338, v21.2.1).
>
>   Thank you very much!
>
>
> Best Regards,
>
> Zicong
>
>
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to