Hi, I generated two benchmarks which contain many store instructions or load instructions to different blocks. The results of running them confuse me a lot.
For example: benchmark 1 (all store): mov %eax,0x749ac0(%rcx) mov %eax,0xc99080(%rcx) mov %eax,0x91c980(%rcx) benchmark 2 (all load): mov 0x749ac0(%rcx),%eax mov 0xc99080(%rcx) %eax mov 0x91c980(%rcx) %eax All these stores/loads are miss in both L1 and L2 cache. Then I monitor the traffic on memory bus (the bus between L2 cache and dram). I anticipate that most request traffic will follow each other (with an interval of less than 10 cycles) since those memory instructions are close to each other. Benchmark 2(all loads miss) confirms to my anticipation very well. However for benchmark 1, I found that most of the requests traffic to DRAM have an interval of 100 cycles (about the memory access penalty). Which means probably that a subsequent store instruction can be served only after the previous one has been finished. This implies that an store miss will block the memory access stream until it is satisfied so that the next store can go on to the memory system. As I know, gem5 uses a weak consistency model and stores can be re-ordered and issued together. I also checked the code of LSQ and writebuffer but still have no idea why most stores get served 100 cycles after previous one. I also tried adjusting the size of ROB, LSQ, write buffer, etc. but no influence on the result. I do some experiments and found the stores are at least not blocked in LSQ. So I guess maybe in the write buffer? If anyone has similar experience or any idea please let me know. That will be very helpful. Thank you!
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
