Hi, Make sure your system has enough MSHRs, out of the box, L1, and L2 are set to have a few MSHR entries. Also, stride prefetcher is not the best, you may try something better: DCPT gives me better numbers.
On Fri, Apr 15, 2022 at 4:57 AM Zicong Wang via gem5-users < gem5-users@gem5.org> wrote: > Hi Jason, > > We are testing the memory bandwidth program STREAM **( > https://www.cs.virginia.edu/stream/), but the results show that the CPU > cannot fully utilize the DDR bandwidth, and the achieved bandwidth is quite > low and about 1/10 of the peak bandwidth (peakBW in stats.txt). I tested > the STREAM binary on my x86 computer and got the near peak bandwidth, so I > believe the program is ok. > > I've seen the maillist dialogue > https://www.mail-archive.com/gem5-users@gem5.org/msg12965.html, and I > think I've met the similar problem. So I tried the suggestions proposed by > Andreas, including *enable l1/l2 prefetcher*, *********using > ARM detailed CPU*. Although these methods can improve the bandwidth, the > results show it has limited effect. Besides, I've also tested the STREAM > program in FS mode with x86 O3/Minor/TimingSimple CPU, and tested it in SE > mode with ruby option, but all the results are similar and there is no > essential difference. > > I guess it is a general problem in simulation with gem5. I'm wondering > if the result is expected or is there something wrong with the system > model? > > Two of the experimental results are attached for reference: > > *1. **X86 O3CPU, SE-mode, w/o l2 prefetcher:* > > ./build/X86/gem5.opt --outdir=m5out-stream configs/example/se.py > --cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache > --l2_size=8MB --mem-type=DDR3_1600_8x8 -c ../stream/stream > > *STREAM output:* > ------------------------------------------------------------- > > Function Best Rate MB/s Avg time Min time Max time > Copy: 1099.0 0.014559 0.014559 0.014559 > Scale: 1089.7 0.014683 0.014683 0.014683 > Add: 1213.0 0.019786 0.019786 0.019786 > Triad: 1222.1 0.019639 0.019639 0.019639 > ------------------------------------------------------------- > > *stats.txt (dram related):* > > system.mem_ctrls.dram.bytesRead 238807808 # Total bytes read > (Byte) > system.mem_ctrls.dram.bytesWritten 121179776 # Total bytes written > (Byte) > system.mem_ctrls.dram.avgRdBW 718.689026 # Average DRAM read > bandwidth in MiBytes/s ((Byte/Second)) > system.mem_ctrls.dram.avgWrBW 364.688977 # Average DRAM write > bandwidth in MiBytes/s ((Byte/Second)) > system.mem_ctrls.dram.peakBW 12800.00 # Theoretical peak > bandwidth in MiByte/s ((Byte/Second)) > system.mem_ctrls.dram.busUtil 8.46 # Data bus > utilization in percentage (Ratio) > system.mem_ctrls.dram.busUtilRead 5.61 # Data bus > utilization in percentage for reads (Ratio) > system.mem_ctrls.dram.busUtilWrite 2.85 # Data bus > utilization in percentage for writes (Ratio) > system.mem_ctrls.dram.pageHitRate 40.57 # Row buffer hit > rate, read and write combined (Ratio) > > > *2**. X86 O3CPU, SE**-mode, w/* *l2 prefetcher:* > > ./build/X86/gem5.opt --outdir=m5out-stream-l2hwp configs/example/se.py > --cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache > --l2_size=8MB --l2-hwp-typ=StridePrefetcher --mem-type=DDR3_1600_8x8 -c > ../stream/stream > > *STREAM output:* > ------------------------------------------------------------- > Function Best Rate MB/s Avg time Min time Max time > Copy: 1703.9 0.009390 0.009390 0.009390 > Scale: 1718.6 0.009310 0.009310 0.009310 > Add: 2087.3 0.011498 0.011498 0.011498 > Triad: 2227.2 0.010776 0.010776 0.010776 > ------------------------------------------------------------- > > *stats.txt (dram related):* > > system.mem_ctrls.dram.bytesRead 238811712 # Total bytes read > (Byte) > system.mem_ctrls.dram.bytesWritten 121179840 # Total bytes written > (Byte) > system.mem_ctrls.dram.avgRdBW 1014.129912 # Average DRAM read > bandwidth in MiBytes/s ((Byte/Second)) > system.mem_ctrls.dram.avgWrBW 514.598298 # Average DRAM write > bandwidth in MiBytes/s ((Byte/Second)) > system.mem_ctrls.dram.peakBW 12800.00 # Theoretical peak > bandwidth in MiByte/s ((Byte/Second)) > system.mem_ctrls.dram.busUtil 11.94 # Data bus > utilization in percentage (Ratio) > system.mem_ctrls.dram.busUtilRead 7.92 # Data bus > utilization in percentage for reads (Ratio) > system.mem_ctrls.dram.busUtilWrite 4.02 # Data bus > utilization in percentage for writes (Ratio) > system.mem_ctrls.dram.pageHitRate 75.37 # Row buffer hit > rate, read and write combined (Ratio) > > > *STREAM compil**ing options:* > > gcc -O2 -static -DSTREAM_ARRAY_SIZE=1000000 -DNTIMES=2 stream.c -o stream > > All the experiments are performed on the latest stable version > (141cc37c2d4b93959d4c249b8f7e6a8b2ef75338, v21.2.1). > > Thank you very much! > > > Best Regards, > > Zicong > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s