Re: [gem5-users] discussion on modeling shared L3 cache hierarchy

Andreas Hansson Wed, 13 Mar 2013 01:28:26 -0700

Hi Hanfeng,

Without having looked at the numbers, have you had a look at the generated
config.dot.pdf in m5out to ensure that the system architecture ends up
being what you intend?


You need py-dot installed for the figure to be generated.

Andreas


On 13/03/2013 08:19, "hanfeng QIN" <[email protected]> wrote:

>Hi all,
>
>I model a shared L3 while private L2 cache hierarchy with gem5. During
>my experiments (running gem5 under classic memory model and SE mode), I
>control the main simulation parameters as following.
>
>1 core
>private L1 dcache: 32KB/8-way; icache: 32KB/4-way
>private L2 cache: 256KB/8-way
>shared L3 cahce: 1MB/16-way
>
>I select workloads from SPEC CPU 2K6. The first 500 million instructions
>are fast forwarded and then 500 millions instructions are cache warmed
>up. Later another 500 million instructions are detailed simulated with
>O3 cpu.
>
>I get the simulation stats as following (these stats are about
>'switch_cpus_1' with detailed measurement, 1st column is the benchmark
>name while the 2nd name is the corresponding metric stat)
>
>( A ). L1D$$ miss rate:
>
>401 0.039705
>403 0.154247
>410 0.107384
>450 0.169413
>459 0.000102
>462 0.031115
>471 0.060141
>
>( B ). L2$$ miss rate:
>
>401 0.463055
>403 0.900824
>410 0.344414
>450 0.820350
>459 0.989815
>462 0.997665
>471 0.964760
>
>( C ). L3$$ miss rate:
>
>401 0.334149
>403 0.291530
>410 0.918030
>450 0.561909
>459 0.970418
>462 0.989723
>471 0.080612
>
>I am surprised by these statistics. Several workloads (such as 403.gcc,
>459.GemsFDTD, 462.libquantum and 471.omnetpp) have a large L2 and L3
>cache miss rate, very close to 100%. I am not sure whether it is related
>to my cache configuration settings or intrinsic behavior characteristics
>of benchmarks. Attached please find related cache configuration files.
>
>--------------------------
>config/common/CacheConfig.py-----------------------------------------
>
>     if options.l3cache:
>         system.l3 = l3_cache_class(clock=options.clock,
>                                    size=options.l3_size,
>                                    assoc=options.l3_assoc,
>block_size=options.cacheline_size)
>
>         system.tol3bus = CoherentBus(clock = options.clock, width = 32)
>         system.l3.cpu_side = system.tol3bus.master
>         system.l3.mem_side = system.membus.slave
>     else:
>         if options.l2cache:
>             system.l2 = l2_cache_class(clock=options.clock,
>                                        size=options.l2_size,
>                                        assoc=options.l2_assoc,
>block_size=options.cacheline_size)
>
>             system.tol2bus = CoherentBus(clock = options.clock, width =
>32)
>             system.l2.cpu_side = system.tol2bus.master
>             system.l2.mem_side = system.membus.slave
>
>     for i in xrange(options.num_cpus):
>         if options.caches:
>             icache = icache_class(size=options.l1i_size,
>                                   assoc=options.l1i_assoc,
>                                   block_size=options.cacheline_size)
>             dcache = dcache_class(size=options.l1d_size,
>                                   assoc=options.l1d_assoc,
>                                   block_size=options.cacheline_size)
>
>             if options.l3cache:
>                 system.cpu[i].l2 = l2_cache_class(size = options.l2_size,
>                                                   assoc =
>options.l2_assoc,
>                                                   block_size =
>options.cacheline_size)
>                 system.cpu[i].tol2bus = CoherentBus()
>                 system.cpu[i].l2.cpu_side = system.cpu[i].tol2bus.master
>                 system.cpu[i].l2.mem_side = system.tol3bus.slave
>
>             if buildEnv['TARGET_ISA'] == 'x86':
>                     system.cpu[i].addPrivateSplitL1Caches(icache, dcache,
>PageTableWalkerCache(),
>PageTableWalkerCache())
>             else:
>                     system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
>         system.cpu[i].createInterruptController()
>         if options.l3cache:
>             system.cpu[i].connectAllPorts(system.cpu[i].tol2bus,
>system.membus)
>         else:
>             if options.l2cache:
>                 system.cpu[i].connectAllPorts(system.tol2bus,
>system.membus)
>             else:
>                 system.cpu[i].connectAllPorts(system.membus)
>
>--------------------------
>config/common/Caches.py-----------------------------------------
>
>class L1Cache(BaseCache):
>     assoc = 2
>     hit_latency = 2
>     response_latency = 2
>     block_size = 64
>     mshrs = 4
>     tgts_per_mshr = 20
>     is_top_level = True
>
>class L2Cache(BaseCache):
>     assoc = 8
>     block_size = 64
>     hit_latency = 8
>     response_latency = 8
>     mshrs = 16
>     tgts_per_mshr = 16
>     write_buffers = 8
>
>class L3Cache(BaseCache):
>     assoc = 16
>     block_size = 64
>     hit_latency = 20
>     response_latency = 20
>     mshrs = 512
>     tgts_per_mshr = 20
>     write_buffers = 256
>
>
>
>Any mistakes I missing?
>
>
>
>Thanks in advance,
>
>Hanfeng


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] discussion on modeling shared L3 cache hierarchy

Reply via email to