Re: [gem5-users] discussion on modeling shared L3 cache, hierarchy

hanfeng QIN Wed, 13 Mar 2013 07:01:11 -0700

Hi Andreas,

Many thanks for your suggestion. The generated figure is just what Iexpect. So I can ensure the correctness of my cache configurations.


However, I still need to meditate these surprising simulation results.

Hanfeng

On 03/13/2013 09:36 PM, gem5-users-requ...@gem5.org wrote:

Hi Hanfeng,

Without having looked at the numbers, have you had a look at the generated
config.dot.pdf in m5out to ensure that the system architecture ends up
being what you intend?

You need py-dot installed for the figure to be generated.

Andreas


On 13/03/2013 08:19, "hanfeng QIN"<hanfeng.h...@gmail.com>  wrote:

>Hi all,
>
>I model a shared L3 while private L2 cache hierarchy with gem5. During
>my experiments (running gem5 under classic memory model and SE mode), I
>control the main simulation parameters as following.
>
>1 core
>private L1 dcache: 32KB/8-way; icache: 32KB/4-way
>private L2 cache: 256KB/8-way
>shared L3 cahce: 1MB/16-way
>
>I select workloads from SPEC CPU 2K6. The first 500 million instructions
>are fast forwarded and then 500 millions instructions are cache warmed
>up. Later another 500 million instructions are detailed simulated with
>O3 cpu.
>
>I get the simulation stats as following (these stats are about
>'switch_cpus_1' with detailed measurement, 1st column is the benchmark
>name while the 2nd name is the corresponding metric stat)
>
>( A ). L1D$$ miss rate:
>
>401 0.039705
>403 0.154247
>410 0.107384
>450 0.169413
>459 0.000102
>462 0.031115
>471 0.060141
>
>( B ). L2$$ miss rate:
>
>401 0.463055
>403 0.900824
>410 0.344414
>450 0.820350
>459 0.989815
>462 0.997665
>471 0.964760
>
>( C ). L3$$ miss rate:
>
>401 0.334149
>403 0.291530
>410 0.918030
>450 0.561909
>459 0.970418
>462 0.989723
>471 0.080612
>
>I am surprised by these statistics. Several workloads (such as 403.gcc,
>459.GemsFDTD, 462.libquantum and 471.omnetpp) have a large L2 and L3
>cache miss rate, very close to 100%. I am not sure whether it is related
>to my cache configuration settings or intrinsic behavior characteristics
>of benchmarks. Attached please find related cache configuration files.
>
>--------------------------
>config/common/CacheConfig.py-----------------------------------------
>
>     if options.l3cache:
>         system.l3 = l3_cache_class(clock=options.clock,
>                                    size=options.l3_size,
>                                    assoc=options.l3_assoc,
>block_size=options.cacheline_size)
>
>         system.tol3bus = CoherentBus(clock = options.clock, width = 32)
>         system.l3.cpu_side = system.tol3bus.master
>         system.l3.mem_side = system.membus.slave
>     else:
>         if options.l2cache:
>             system.l2 = l2_cache_class(clock=options.clock,
>                                        size=options.l2_size,
>                                        assoc=options.l2_assoc,
>block_size=options.cacheline_size)
>
>             system.tol2bus = CoherentBus(clock = options.clock, width =
>32)
>             system.l2.cpu_side = system.tol2bus.master
>             system.l2.mem_side = system.membus.slave
>
>     for i in xrange(options.num_cpus):
>         if options.caches:
>             icache = icache_class(size=options.l1i_size,
>                                   assoc=options.l1i_assoc,
>                                   block_size=options.cacheline_size)
>             dcache = dcache_class(size=options.l1d_size,
>                                   assoc=options.l1d_assoc,
>                                   block_size=options.cacheline_size)
>
>             if options.l3cache:
>                 system.cpu[i].l2 = l2_cache_class(size = options.l2_size,
>                                                   assoc =
>options.l2_assoc,
>                                                   block_size =
>options.cacheline_size)
>                 system.cpu[i].tol2bus = CoherentBus()
>                 system.cpu[i].l2.cpu_side = system.cpu[i].tol2bus.master
>                 system.cpu[i].l2.mem_side = system.tol3bus.slave
>
>             if buildEnv['TARGET_ISA'] == 'x86':
>                     system.cpu[i].addPrivateSplitL1Caches(icache, dcache,
>PageTableWalkerCache(),
>PageTableWalkerCache())
>             else:
>                     system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
>         system.cpu[i].createInterruptController()
>         if options.l3cache:
>             system.cpu[i].connectAllPorts(system.cpu[i].tol2bus,
>system.membus)
>         else:
>             if options.l2cache:
>                 system.cpu[i].connectAllPorts(system.tol2bus,
>system.membus)
>             else:
>                 system.cpu[i].connectAllPorts(system.membus)
>
>--------------------------
>config/common/Caches.py-----------------------------------------
>
>class L1Cache(BaseCache):
>     assoc = 2
>     hit_latency = 2
>     response_latency = 2
>     block_size = 64
>     mshrs = 4
>     tgts_per_mshr = 20
>     is_top_level = True
>
>class L2Cache(BaseCache):
>     assoc = 8
>     block_size = 64
>     hit_latency = 8
>     response_latency = 8
>     mshrs = 16
>     tgts_per_mshr = 16
>     write_buffers = 8
>
>class L3Cache(BaseCache):
>     assoc = 16
>     block_size = 64
>     hit_latency = 20
>     response_latency = 20
>     mshrs = 512
>     tgts_per_mshr = 20
>     write_buffers = 256
>
>
>
>Any mistakes I missing?
>
>
>
>Thanks in advance,
>
>Hanfeng


_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] discussion on modeling shared L3 cache, hierarchy

Reply via email to