Hi Majid,

I'm no expert in the classic cache model but let me share my understanding of 
the issue.

First, did you notice there is 2 coherentXBars in your system? You have the 
system XBar (system.membus) and the L1s to L2 XBar. You are currently looking 
at the L1s to L2 XBar. This crossbar should be considered a detail of 
implementation as in a real system, the crossbar is part of the shared L2.

But your questions remain valid except that the L2 XBar should even require 
only 8 lines as L2 is bellow the L2XBar so it is outside of its snoop domain. 
16 lines should be enough for the membus. However, it would only hold true only 
if lines were evicted before fill requests were sent downstream. It looks like 
this is not the case and a cache first issues a read and then evicts a line, if 
needed, when getting the data back (see in BaseCache, 
recvTimingResp->handleFill->allocateBlock->handleEvictions->evictBlock->{writebackBlk
 | cleanEvictBlk}). This should answer (2), if I'm correct.

As for (1), I think you are right: the check should be located right at 
allocation point to be accurate. Yet, consider this to merely be a safety check 
in case the directory was growing to unreasonable sizes and hogging host 
memory. This is not a threat to simulation corerectness. You are not really 
supposed to fit directory size to your particular cache hierarchy. Actually, I 
think that you can't do that as you cannot easily bound the number of 
outstanding fill and evict requests in your system. I personally measured up to 
12 allocated entries in the L2XBar snoop filter compared to 8 L1 cachelines and 
up to 17 entries in the membus snoop filter compared to 16 L1 + L2 cachelines. 
It might reach more with a different benchmark. Using a "big enough" maximum 
directory size is the recommended approach.

In real systems, the directory size and associativity is carefully defined not 
to waste nor lack resources based on cache hierarchy configuration (size, 
inclusivity, associativity) and coherence protocol. Directories are sized after 
functional requirements first as opposed to caches that are sized after 
perf/area/power considerations first. In gem5, directory entries are allocated 
as needed and deallocated in a timely manner, which is optimal to model the 
functional behavior of a directory sized to never lack of available entries, as 
in most (all?) systems I am aware of.

If you want to experiment with exotic snoop filters, you might need to 
implement it yourself. Notice that the classic caches in gem5 are not meant to 
model coherency traffic accurately and mostly focus on functional coherence. 
Ruby (e.g., the CHI protocol) models coherency more accurately but does not 
allow configuring the directory size in the current implementation. If you 
don't plan on studying snoop filters, just leave max size to "plenty enough" 
and forget about it, I guess ;)

Regards,
Gabriel
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to