Hi Jason,

  We are testing the memory bandwidth program STREAM 
​(https://www.cs.virginia.edu/stream/)​, but the results show that the CPU 
cannot fully utilize the DDR bandwidth, and the achieved bandwidth is quite low 
and about 1/10 of the peak bandwidth (peakBW in stats.txt). I tested the STREAM 
binary on my x86 computer and got the near peak bandwidth, so I believe the 
program is ok.

  I've seen the maillist dialogue 
https://www.mail-archive.com/gem5-users@gem5.org/msg12965.html, and I think 
I've met the similar problem. So I tried the suggestions proposed by ​Andreas, 
including ​enable l1/l2 prefetcher, ​​​​​​​​​using ARM detailed CPU. Although 
these methods can improve the bandwidth, the results show it has limited 
effect. Besides, I've also tested the STREAM program in FS mode with x86 
O3/Minor/TimingSimple CPU, and tested it in SE mode with ruby option, but all 
the results are similar and there is no essential difference.

  I guess it is a general problem in simulation with gem5. I'm wondering if the 
result is expected or is there something wrong with the system model?

  Two of the experimental results are attached for reference:

1. X86 O3CPU, SE-mode, w/o l2 prefetcher:

./build/X86/gem5.opt --outdir=m5out-stream configs/example/se.py 
--cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache 
--l2_size=8MB --mem-type=DDR3_1600_8x8 -c ../stream/stream

STREAM output:​

-------------------------------------------------------------

Function    Best Rate MB/s     Avg time     Min time     Max time
Copy:            1099.0     0.014559     0.014559     0.014559
Scale:           1089.7     0.014683     0.014683     0.014683
Add:             1213.0     0.019786     0.019786     0.019786
Triad:           1222.1     0.019639     0.019639     0.019639
-------------------------------------------------------------

stats.txt (dram related):

system.mem_ctrls.dram.bytesRead          238807808   # Total bytes read (Byte)
system.mem_ctrls.dram.bytesWritten       121179776   # Total bytes written 
(Byte)
system.mem_ctrls.dram.avgRdBW           718.689026   # Average DRAM read 
bandwidth in MiBytes/s ((Byte/Second))
system.mem_ctrls.dram.avgWrBW           364.688977   # Average DRAM write 
bandwidth in MiBytes/s ((Byte/Second))
system.mem_ctrls.dram.peakBW              12800.00   # Theoretical peak 
bandwidth in MiByte/s ((Byte/Second))
system.mem_ctrls.dram.busUtil                 8.46   # Data bus utilization in 
percentage (Ratio)
system.mem_ctrls.dram.busUtilRead             5.61   # Data bus utilization in 
percentage for reads (Ratio)
system.mem_ctrls.dram.busUtilWrite            2.85   # Data bus utilization in 
percentage for writes (Ratio)
system.mem_ctrls.dram.pageHitRate            40.57   # Row buffer hit rate, 
read and write combined (Ratio)




2. X86 O3CPU, SE-mode, w/l2 prefetcher:

​./build/X86/gem5.opt --outdir=m5out-stream-l2hwp configs/example/se.py 
--cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache 
--l2_size=8MB --l2-hwp-typ=StridePrefetcher --mem-type=DDR3_1600_8x8 -c 
../stream/stream 

STREAM output:​

-------------------------------------------------------------
Function    Best Rate MB/s     Avg time     Min time     Max time
Copy:            1703.9     0.009390     0.009390     0.009390
Scale:           1718.6     0.009310     0.009310     0.009310
Add:             2087.3     0.011498     0.011498     0.011498
Triad:           2227.2     0.010776     0.010776     0.010776
-------------------------------------------------------------

stats.txt (dram related):

system.mem_ctrls.dram.bytesRead          238811712   # Total bytes read (Byte)
system.mem_ctrls.dram.bytesWritten       121179840   # Total bytes written 
(Byte)
system.mem_ctrls.dram.avgRdBW          1014.129912   # Average DRAM read 
bandwidth in MiBytes/s ((Byte/Second))
system.mem_ctrls.dram.avgWrBW           514.598298   # Average DRAM write 
bandwidth in MiBytes/s ((Byte/Second))
system.mem_ctrls.dram.peakBW              12800.00   # Theoretical peak 
bandwidth in MiByte/s ((Byte/Second))
system.mem_ctrls.dram.busUtil                11.94   # Data bus utilization in 
percentage (Ratio)
system.mem_ctrls.dram.busUtilRead             7.92   # Data bus utilization in 
percentage for reads (Ratio)
system.mem_ctrls.dram.busUtilWrite            4.02   # Data bus utilization in 
percentage for writes (Ratio)
system.mem_ctrls.dram.pageHitRate            75.37   # Row buffer hit rate, 
read and write combined (Ratio)




STREAM compiling options:

gcc -O2 -static -DSTREAM_ARRAY_SIZE=1000000 -DNTIMES=2 stream.c -o stream​

All the experiments are performed on the latest stable version 
(141cc37c2d4b93959d4c249b8f7e6a8b2ef75338, v21.2.1).

  Thank you very much!




Best Regards,

Zicong


_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to