Hi Marco, I'm currently trying to track down bugs in checkpoint restore to get x86+Ruby+O3CPU working, and I'm having trouble replicating your bug. Could you please compile build/X86_MOESI_hammer/gem5.debug and run the same tests you have here to grab this backtrace? Also, can you collect and restore from checkpoint with a single CPU core and see what happens?
Thanks! Joel On Wed, Aug 29, 2012 at 5:11 PM, Marco Elver <marco.el...@ed.ac.uk> wrote: > Thank you, with the patch I can confirm that the assertion problem has > been fixed (after recreating the checkpoint). > > My problems with the O3CPU persist, and was wondering if this is a > problem specific to X86 or is it a general problem? > > -- Marco > > On 28/08/12 21:28, Nilay Vaish wrote: > > The cause of the assert failure was tracked down recently by Jason > > Power. The patch is on the review board. Here is the link - > > http://reviews.gem5.org/r/1365 > > > > It will be committed to the mainline soon. > > > > -- > > Nilay > > > > > > On Tue, 28 Aug 2012, Marco Elver wrote: > > > >> Hi all, > >> > >> I would like to ask if what I am trying to do is even possible (and if > >> so, how??), as I have been running into a few problems, despite > >> following the advice I could find in older mailing-list threads or the > >> wiki. My goal would be to run a full-system with ruby (with > >> MOESI_CMP_directory), multiple processors of type O3CPU and the X86 ISA; > >> I create a snapshot after the Linux kernel loaded and before the > >> benchmark enters the ROI. > >> > >> With revision 9174:2171e04a2ee5 (Mon Aug 27 20:53:20 2012 -0400) from > >> the dev repository, I tried the following: > >> (1) Take a checkpoint with ruby_fs, the *MOESI_hammer* protocol > >> (only one supporting checkpoints, according to Wiki) and the > >> TimingSimpleCPU (succeeds): > >> $> build/X86_MOESI_hammer/gem5.opt > >> --outdir=m5out/rawdata/fluidanimate/ckpt configs/example/ruby_fs.py -n > >> 16 --cpu-type=timing --kernel=system/x86_64-vmlinux-2.6.28.smp > >> --checkpoint-dir=m5out/checkpoints/fluidanimate --max-checkpoints=1 > >> --script=contrib/initscripts/parsec/fluidanimate.sh > >> > >> (2) Resume from the checkpoint with the O3CPU, restore with > >> TimingSimpleCPU (fails): > >> $> build/X86_MOESI_hammer/gem5.opt > >> --outdir=m5out/rawdata/fluidanimate/detailed configs/example/ruby_fs.py > >> -n 16 --cpu-type=detailed --kernel=system/x86_64-vmlinux-2.6.28.smp > >> --checkpoint-dir=m5out/checkpoints/fluidanimate -r 0 > >> --restore-with-cpu=timing > >> [...] > >> Switch at curTick count:10000 > >> info: Entering event queue @ 0. Starting simulation... > >> Runtime Error at MOESI_hammer-dir.sm:1270, Ruby Time: > >> 1111185: assert failure, PID: 2742 > >> press return to continue. > >> > >> Program aborted at cycle 555592500 > >> > >> (3) Resume from the checkpoint with the TimingSimpleCPU fails in the > >> same way as (2), as in (2) the CPU isn't even switched to the O3CPU > >> before it fails. > >> > >> (4) Though if I try taking a snapshot right after starting the > >> simulator (after ~ 10000000000 cycles, kernel still booting) and then > >> try to restore with the TimingSimpleCPU, it works as expected; only the > >> O3CPU fails with a segfault and the following backtrace: > >> #0 0x0000000000cdff56 in MasterPort::sendTimingReq > >> (this=<optimized out>, pkt=0x6f8a060) > >> at build/X86/mem/port.cc:136 > >> #1 0x00000000005fbac5 in sendTiming (pkt=0x6f8a060, > >> sendingState=0x61a7cc0, this=0x49a9e60) > >> at build/X86/arch/x86/pagetable_walker.cc:173 > >> #2 X86ISA::Walker::WalkerState::sendPackets (this=0x61a7cc0) > >> at build/X86/arch/x86/pagetable_walker.cc:631 > >> #3 0x00000000005fc8c2 in > >> X86ISA::Walker::WalkerState::recvPacket (this=this@entry=0x61a7cc0, > >> pkt=pkt@entry=0x1e99920) at > >> build/X86/arch/x86/pagetable_walker.cc:590 > >> #4 0x00000000005fcb98 in X86ISA::Walker::recvTimingResp > >> (this=0x43706c0, pkt=0x1e99920) > >> at build/X86/arch/x86/pagetable_walker.cc:129 > >> #5 0x0000000000ce1f5b in PacketQueue::trySendTiming > >> (this=0x42ba5e0) > >> at build/X86/mem/packet_queue.cc:152 > >> #6 0x0000000000ce2929 in PacketQueue::sendDeferredPacket > >> (this=0x42ba5e0) > >> at build/X86/mem/packet_queue.cc:190 > >> #7 0x0000000000c391be in EventQueue::serviceOne > >> (this=<optimized out>) at build/X86/sim/eventq.cc:204 > >> #8 0x0000000000c7d342 in simulate > >> (num_cycles=9223372036854785807) at build/X86/sim/simulate.cc:71 > >> #9 0x0000000000b8e17c in _wrap_simulate__SWIG_0 > >> (args=<optimized out>) > >> at build/X86/python/swig/event_wrap.cc:4755 > >> #10 _wrap_simulate (self=<optimized out>, args=<optimized out>) > >> at build/X86/python/swig/event_wrap.cc:4804 > >> #11 0x00007fb32a094fc6 in PyEval_EvalFrameEx () from > >> /lib/libpython2.7.so.1.0 > >> > >> Trying to restore with ruby using MOESI_CMP_directory and the > >> TimingSimpleCPU results in the same error as (2), with the difference > >> that it finishes loading the checkpoint, resumes, but then fails after > >> about a minute ("Runtime Error at MOESI_CMP_directory-dir.sm:485, Ruby > >> Time: 12038425921: assert failure, PID: 19169"). Using the O3CPU still > >> results in the same error as (4). > >> > >> In addition, I have seen workflows of: 1) create checkpoint without ruby > >> and with the AtomicSimpleCPU 2) load checkpoint with ruby and the > >> TimingSimpleCPU. I tried this, and it works if I set > >> --restore-with-cpu=timing. But trying this with the O3CPU doesn't work, > >> resulting in the same backtrace as (4). > >> > >> Is what I'm trying to do possible? If so, any workarounds I should > >> know of? > >> > >> Thanks, > >> Marco > >> > >> > >> -- > >> The University of Edinburgh is a charitable body, registered in > >> Scotland, with registration number SC005336. > >> > >> _______________________________________________ > >> gem5-users mailing list > >> gem5-users@gem5.org > >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > >> > > _______________________________________________ > > gem5-users mailing list > > gem5-users@gem5.org > > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > -- Joel Hestness PhD Student, Computer Architecture Dept. of Computer Science, University of Wisconsin - Madison Dept. of Computer Science, University of Texas - Austin http://www.cs.utexas.edu/~hestness
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users