Hello Gelin and Giacomo,
Recently I was using gem5 in se mode to run some SPEC 2006 runs and I realize
there is some checkpoint retrieval problem.
In my case it doesn't behave like in Gelin case, but I though that better add
it here than create a new thread.
Some information,
I am working emulating riscv, I didn't check if this is still happens in other
architectures.
This specific test I am using is perlbench test1 from SPEC2006.
I created a checkpoint with gem5 and when retrieving it the following error
appears,
build/RISCV/arch/riscv/faults.cc:b4: panic: Illegal instruction 0x00000000 at
pc 0x00000000000106c4:
Memory Usage: 1214492 KBytes
Program aborted at tick 4436647500
--- BEGIN LIBC BACKTRACE ---
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xba15cc)[0x55689b81c5cc]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbbb2ba)[0x55689b8362ba]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f3dac504980]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f3daaae0fb7]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f3daaae2921]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x1f2a7f)[0x55689ae6da7f]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xa8ffe7)[0x55689b70afe7]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xa90190)[0x55689b70b190]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95a714)[0x55689b5d5714]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95b310)[0x55689b5d6310]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95d330)[0x55689b5d8330]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x95d7a8)[0x55689b5d87a8]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x96c45b)[0x55689b5e745b]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbac815)[0x55689b827815]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbdc120)[0x55689b857120]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xbdca52)[0x55689b857a52]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0xb91a0e)[0x55689b80ca0e]
/data1/home/jvaquero/gem5_orig/build/RISCV/gem5.opt(+0x62e965)[0x55689b2a9965]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyCFunction_Call+0x96)[0x7f3dac924736]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x76e0)[0x7f3dac895b20]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f)[0x7f3dac88ca0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c0fc)[0x7f3dac88d0fc]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3)[0x7f3dac893303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803)[0x7f3dac88b803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be)[0x7f3dac88d2be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3)[0x7f3dac893303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f)[0x7f3dac88ca0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c0fc)[0x7f3dac88d0fc]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3)[0x7f3dac893303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f)[0x7f3dac88ca0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyEval_EvalCodeEx+0x3e)[0x7f3dac88d4ce]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyEval_EvalCode+0x1b)[0x7f3dac88e24b]
--- END LIBC BACKTRACE ---
Aborted (core dumped)
This occurs in gem5.opt as well as in gem5.debug.
After some debug and tracing, comparing the full execution and the restored
one, I found the main difference is in the following piece of trace
394631500: system.switch_cpus: A0 T0 : 0x62ee2 @Perl_yyparse+408 : lh s3,
-1720(a4) : MemRead : D=0x0000000000000000 A=0x1362d0
FetchSeq=289047 CPSeq=255916 flags=(IsInteger|IsLoad)
394631500: system.switch_cpus: A0 T0 : 0x62ee6 @Perl_yyparse+412 : c_li a4,
1 : IntAlu : D=0x0000000000000001 FetchSeq=289048
CPSeq=255917 flags=(IsInteger)
394632000: system.switch_cpus: A0 T0 : 0x62ee8 @Perl_yyparse+414 : addi a3,
zero, 197 : IntAlu : D=0x00000000000000c5 FetchSeq=289049
CPSeq=255918 flags=(IsInteger)
394632000: system.switch_cpus: A0 T0 : 0x62eec @Perl_yyparse+418 : subw a4,
a4, s3 : IntAlu : D=0x0000000000000001 FetchSeq=289050
CPSeq=255919 flags=(IsInteger)
394632500: system.switch_cpus: A0 T0 : 0x62ef0 @Perl_yyparse+422 : c_slli
a4, 3 : IntAlu : D=0x0000000000000008 FetchSeq=289051
CPSeq=255920 flags=(IsInteger)
394633000: system.switch_cpus: A0 T0 : 0x62ef2 @Perl_yyparse+424 : c_add a4,
s7 : IntAlu : D=0x00000000001ae5d8 FetchSeq=289052
CPSeq=255921 flags=(IsInteger)
394633000: system.cpu.workload: Translating: 0x1ae5d8->0x515d8
394633000: system.cpu.dcache: access for ReadReq [515d8:515df] hit state: e (M)
writable: 1 readable: 1 dirty: 1 prefetched: 0 | tag: 0xa secure: 0 valid: 1 |
set: 0x57 way: 0x1
394634000: system.switch_cpus: A0 T0 : 0x62ef4 @Perl_yyparse+426 : c_ld a4,
0(a4) : MemRead : D=0xa423473885660018 A=0x1ae5d8 FetchSeq=289053
CPSeq=255922 flags=(IsInteger|IsLoad)
394634000: system.cpu.workload: Translating: 0x18c0c8->0x1c10c8
394634500: system.switch_cpus: A0 T0 : 0x62ef6 @Perl_yyparse+428 : sd a4,
200(s5) : MemWrite : D=0xa423473885660018 A=0x18c0c8
FetchSeq=289054 CPSeq=255923 flags=(IsInteger|IsStore)
394635000: system.cpu.dcache: access for WriteReq [1c10c8:1c10cf] hit state: e
(M) writable: 1 readable: 1 dirty: 1 prefetched: 0 | tag: 0x38 secure: 0 valid:
1 | set: 0x43 way: 0x1
394635000: system.cpu.dcache: satisfyRequest for WriteReq [1c10c8:1c10cf]
(write)
394635500: system.switch_cpus: A0 T0 : 0x62efa @Perl_yyparse+432 : bltu a3,
a5, 414 : IntAlu : FetchSeq=289055 CPSeq=255924
flags=(IsInteger|IsControl|IsDirectControl|IsCondControl)
394635500: system.switch_cpus: A0 T0 : 0x62efe @Perl_yyparse+436 : c_slli
a5, 32 : IntAlu : D=0x0000000400000000 FetchSeq=289056
CPSeq=255925 flags=(IsInteger)
394636000: system.switch_cpus: A0 T0 : 0x62f00 @Perl_yyparse+438 : lui a4,
309 : IntAlu : D=0x0000000000135000 FetchSeq=289057
CPSeq=255926 flags=(IsInteger)
394636000: system.switch_cpus: A0 T0 : 0x62f04 @Perl_yyparse+442 : c_srli
a5, 30 : IntAlu : D=0x0000000000000010 FetchSeq=289058
CPSeq=255927 flags=(IsInteger)
394636500: system.switch_cpus: A0 T0 : 0x62f06 @Perl_yyparse+444 : addi a4,
a4, 1560 : IntAlu : D=0x0000000000135618 FetchSeq=289059
CPSeq=255928 flags=(IsInteger)
394637000: system.switch_cpus: A0 T0 : 0x62f0a @Perl_yyparse+448 : c_add a5,
a4 : IntAlu : D=0x0000000000135628 FetchSeq=289060
CPSeq=255929 flags=(IsInteger)
The bold parts of the text shows that in the tick 394634000 there is a C.LD
instruction that loads from memory address 0x1ae5d8. In the trace from the full
execution that load returns 0.
Looking on differences, I found that the main difference is that the MMU
translates that address to 0x515d8 that contains information. Where that comes
from?
After using the gdb to check what is putting that info there and why the MMU is
considering that the PA is free I discovered that that area is stored during
the checkpoint unserializeStore function, where it loads the
system.physmem.store0.pmem file.
So that comes from the code previous to the checkpoint, so we take a look at
the full execution trace to check where is the physical page 0x51000 used.
469397500: system.cpu: A0 T0 : 0x12ffc @Perl_yylex+6674 : c_jr a4
: IntAlu : FetchSeq=429379 CPSeq=363289
flags=(IsInteger|IsControl|IsIndirectControl|IsUncondControl|IsCall)
469403000: system.cpu.workload: Translating: 0x1b580->0x51580
469486000: system.cpu.workload: Translating: 0x1b5c0->0x515c0
469488500: system.cpu: A0 T0 : 0x1b5b2 @Perl_yylex+40904 : jal zero,
-12884 : IntAlu : D=0x000000000001b5b6 FetchSeq=429404
CPSeq=363290 flags=(IsInteger|IsControl|IsDirectControl|IsUncondControl|IsCall)
469621000: system.cpu.workload: Translating: 0x18340->0x4e340
469704000: system.cpu.workload: Translating: 0x18380->0x4e380
So it looks like the code read in the checkpoint is an instruction from the
.text segment.
The at this point I find, it should be supose to be marked the page as used in
the Ptable entries in the m5.cpt ?
If it is not there is because that is not used in the checkpoint? but then the
memory area should be zeroed? so when this situation happens, that the physical
memory is used that content is not being used?
Or am I very wrong and I am reading all this wrongly, in that case, can you
help me pointing out my errors?
Thank you for your time and your help, I tried to attach the traces files but
the tgz of both of them together where over 500mb.
The commands used in these tests are,
full exec:
gem5_orig/build/RISCV/gem5.opt --outdir gem5_orig/configs/example/se.py
--cpu-type=MinorCPU --bp-type=TAGE --caches
--mem-size 1073741824 -c
$SPEC_DIR/CPU2006/400.perlbench/exe/perlbench_base.riscv -o " -I./lib
attrs.pl
checkpoint_generation:
gem5_orig/build/RISCV/gem5.opt --outdir gem5_orig/configs/example/se.py
--take-checkpoints 124853750,10000000 --cpu-type=MinorCPU --bp-type=TAGE
--caches
--mem-size 1073741824 -c
$SPEC_DIR/CPU2006/400.perlbench/exe/perlbench_base.riscv -o " -I./lib
attrs.pl
checkpoint restore:
gem5_orig/build/RISCV/gem5.opt --outdir
gem5_orig/configs/example/se.py --checkpoint-dir ./ -r 1 --cpu-type=MinorCPU
--bp-type=TAGE --caches --mem-size 1073741824 -c
$SPEC_DIR/CPU2006/400.perlbench/exe/perlbench_base.riscv -o " -I./lib attrs.pl
Thanks again, and sorry for the long post.
---- Activat Mon, 22 Nov 2021 14:23:48 +0100 Giacomo Travaglini via gem5-users
<gem5-users@gem5.org> va escriure ----
Hi Gelin,
Are you compiling gem5 in debug mode?
You can do that by using “debug” instead of “opt”:
$scons build/ARM/gem5.debug -j`nproc`
Kind Regards
Giacomo
From: Gelin Fu via gem5-users <mailto:gem5-users@gem5.org>
Date: Monday, 22 November 2021 at 12:26
To: mailto:gem5-users@gem5.org <mailto:gem5-users@gem5.org>
Cc: Gelin Fu <mailto:20153...@cqu.edu.cn>
Subject: [gem5-users] Re: Problem with checkpoint and restoration in gem5 se
mode
Hi, Giacomo.Thanks for your reply.
I am not familiar with gdb in se mode. So I try to use debug functions such as
curTick() and eventqDump(). But gdb tells me that there is no symbol about
eventqDump() and curTick. So I only use backtrace when the program aborted.
I am using the command as below:
gdb --args $GEM5_BIN --outdir=$OUTPUT_PATH $GEM5_PATH/configs/example/se.py \
--num-cpu 1 --cpu-clock 2.5GHz --cpu-type O3_ARM_v7a_3 \
--restore-with-cpu O3_ARM_v7a_3 -r 1 --checkpoint-dir \
"$CHECK_PATH" --caches --mem-type DDR3_2133_8x8 --mem-size 1GB \
-c "$TARGET_PATH" --options "$DATA_PATH"
the gdb output are as below:
(gdb) r
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff5ca7921 in __GI_abort () at abort.c:79
#2 0x00007ffff5c9748a in __assert_fail_base (
fmt=0x7ffff5e1e750 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x55555868991a "when >= getCurTick()",
file=file@entry=0x555558689902 "build/ARM/sim/eventq.hh",
line=line@entry=766,
function=function@entry=0x555558689b20
<gem5::EventQueue::schedule(gem5::Event*, unsigned long,
bool)::__PRETTY_FUNCTION__> "void gem5::EventQueue::schedule(gem5::Event*,
gem5::Tick, bool)") at assert.c:92
#3 0x00007ffff5c97502 in __GI___assert_fail (
assertion=0x55555868991a "when >= getCurTick()",
file=0x555558689902 "build/ARM/sim/eventq.hh", line=766,
function=0x555558689b20 <gem5::EventQueue::schedule(gem5::Event*, unsigned
long, bool)::__PRETTY_FUNCTION__> "void
gem5::EventQueue::schedule(gem5::Event*, gem5::Tick, bool)") at assert.c:101
#4 0x0000555555cc1dfe in gem5::EventQueue::schedule (this=0x55555ad72ea0,
event=0x55555ace0800, when=1010, global=false)
at build/ARM/sim/eventq.hh:766
#5 0x0000555555dd3a94 in gem5::EventManager::schedule (this=0x55555ace0708,
event=..., when=1010) at build/ARM/sim/eventq.hh:1021
#6 0x00005555561fc1a9 in gem5::BaseCache::startup (this=0x55555ace0700)
at build/ARM/mem/cache/base.cc:169
(gdb) p curTick
No symbol "curTick" in current context.
(gdb) p curTick()
No symbol "curTick" in current context.
Kind regards
Gelin
_______________________________________________
gem5-users mailing list -- mailto:gem5-users@gem5.org
To unsubscribe send an email to mailto:gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose,
or store or copy the information in any medium. Thank you.
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s