Hello,
I used gprof to get the time spent on function. My OE function spends
38% of total time, and std::__introsort_loop spends 45% of running time,
I guess this is the function call of two sorts in my OE. The adaptive
routing doesn't take much time. However, when I ran adaptive routing
alone, there seems no improvement on sim_insts (even decreased a little
bit). So I'm facing two problems: 1. the OE routing is too complex in
the routeCompute function. 2. Adaptive routing doesn't improve the
performance. Do you have any suggestions for help?
Yuhang
On 09/12/2013 07:31 PM, Andreas Hansson wrote:
Hi Yuhang,
I suspect you call your routing function every cycle for every packet,
causing the massive slow down. You can always do a profiling run to
figure out where the time is spent. Build gem5.perf and use google
perftools to analyse the output, or use gem5.prof and analyse it with
pprof.
Good luck.
Andreas
From: Yuhang <[email protected] <mailto:[email protected]>>
Reply-To: gem5 users mailing list <[email protected]
<mailto:[email protected]>>
Date: Thursday, 12 September 2013 18:25
To: gem5 users mailing list <[email protected]
<mailto:[email protected]>>
Subject: [gem5-users] Odd even and adaptive routing didn't improve the
performance
Hello all,
I implemented odd even scheme and adaptive routing in garnet. For odd
even, I use the algorithm in the paper /The odd-even turn model for
adaptive routing/ (Ge-Ming Chiu 2000). For adaptive routing, I use
get_credit_cnt(vcs) for each output to sum up all the credits in it,
and choose the one with most credits. I traced the flits flow, they
work fine. However, the performance didn't improve after the
modification.
I ran FFT and LU kernels in splash2 with ALPHA MESI protocol, detailed
cpu type, 4*4 mesh, 1000000000 max ticks.
FFT with OE and adaptive routing FFT without OE and adaptive RADIX
with OE and adaptive RADIX without OE and adaptive
host_inst_rate 1006 11708 1945 15865
sim_insts 15008035 15016804 19748978 19752713
total flits injected 1315661 1309101 1131643 1130144
average latency 20.4676 20.4485 19.9921 19.968
Noticed that the host_inst_rate is extremely low with the
implementation, and the sim_insts even reduced a little bit. Is that
because my modification is too complex, so that each routing takes too
many instructions? Or I just write the codes wrong? I tried to reduce
both l1 and l2 cache size to achieve higher contention, but only got
less than 1% improvement in sim_insts. In addition, the benchmark runs
very slow (usually take one day) with my modification and reduced
cache size. Could anyone give me some help with my issue?
Sincerely,
Yuhang
-- IMPORTANT NOTICE: The contents of this email and any attachments
are confidential and may also be privileged. If you are not the
intended recipient, please notify the sender immediately and do not
disclose the contents to any other person, use it for any purpose, or
store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
9NJ, Registered in England & Wales, Company No: 2548782
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users