Hi Majid, I'm no expert in the classic cache model but let me share my understanding of the issue.
First, did you notice there is 2 coherentXBars in your system? You have the system XBar (system.membus) and the L1s to L2 XBar. You are currently looking at the L1s to L2 XBar. This crossbar should be considered a detail of implementation as in a real system, the crossbar is part of the shared L2. But your questions remain valid except that the L2 XBar should even require only 8 lines as L2 is bellow the L2XBar so it is outside of its snoop domain. 16 lines should be enough for the membus. However, it would only hold true only if lines were evicted before fill requests were sent downstream. It looks like this is not the case and a cache first issues a read and then evicts a line, if needed, when getting the data back (see in BaseCache, recvTimingResp->handleFill->allocateBlock->handleEvictions->evictBlock->{writebackBlk | cleanEvictBlk}). This should answer (2), if I'm correct. As for (1), I think you are right: the check should be located right at allocation point to be accurate. Yet, consider this to merely be a safety check in case the directory was growing to unreasonable sizes and hogging host memory. This is not a threat to simulation corerectness. You are not really supposed to fit directory size to your particular cache hierarchy. Actually, I think that you can't do that as you cannot easily bound the number of outstanding fill and evict requests in your system. I personally measured up to 12 allocated entries in the L2XBar snoop filter compared to 8 L1 cachelines and up to 17 entries in the membus snoop filter compared to 16 L1 + L2 cachelines. It might reach more with a different benchmark. Using a "big enough" maximum directory size is the recommended approach. In real systems, the directory size and associativity is carefully defined not to waste nor lack resources based on cache hierarchy configuration (size, inclusivity, associativity) and coherence protocol. Directories are sized after functional requirements first as opposed to caches that are sized after perf/area/power considerations first. In gem5, directory entries are allocated as needed and deallocated in a timely manner, which is optimal to model the functional behavior of a directory sized to never lack of available entries, as in most (all?) systems I am aware of. If you want to experiment with exotic snoop filters, you might need to implement it yourself. Notice that the classic caches in gem5 are not meant to model coherency traffic accurately and mostly focus on functional coherence. Ruby (e.g., the CHI protocol) models coherency more accurately but does not allow configuring the directory size in the current implementation. If you don't plan on studying snoop filters, just leave max size to "plenty enough" and forget about it, I guess ;) Regards, Gabriel _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s