Dear almighty mailing list: I am trying to study the performance impact of memory controller count. After reading various documentations, tutorials and mailing list threads, I choose full system ALPHA_MOESI_CMP_token with timing cpu, ddr3_1600_x64 memory and Pt2Pt network topology. My testing workload was multi-threaded(OpenMP) STREAM benchmark. Here is the cmd I use (options like outdir, disk-image, script are omitted):
./build/ALPHA_MOESI_CMP_token/gem5.opt ./configs/example/ruby_fs.py -n 4 --cpu-type=timing --mem-size=256MB --mem-type=ddr3_1600_x64 --l1i_size=32kB --l1d_size=32kB --l2_size=2MB --num-l2caches=4 --topology=Pt2Pt --num-dirs=1 On a 4 core testing platform, I sweeped both number of threads and number of directories (num-dirs, which effectively equal to the number of memory controllers). STREAM's outputs indicate that while having more threads increases memory bandwidth, including more directory controllers has very little impact on effective memory bandwidth. I checked stats.txt for multiple memory controller simulation and found that all the memory controllers were active and had processed roughly the same amount of memory request. The total number of memory controller stall cycles in the single MC case is about 10% higher than the total number of stall cycles for all memory controllers in the 4 MC case. To me, these data suggests that the default MC is fast enough to support 4 cores/threads. Is this a reasonable conclusion? If I want to push the MCs to their limit, can I just increase core frequency? Thanks! -- Runjie Zhang Computer Engineering University of Virginia
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users