Hi Marco,
  I'm currently trying to track down bugs in checkpoint restore to get
x86+Ruby+O3CPU working, and I'm having trouble replicating your bug.  Could
you please compile build/X86_MOESI_hammer/gem5.debug and run the same tests
you have here to grab this backtrace?  Also, can you collect and restore
from checkpoint with a single CPU core and see what happens?

  Thanks!
  Joel


On Wed, Aug 29, 2012 at 5:11 PM, Marco Elver <marco.el...@ed.ac.uk> wrote:

> Thank you, with the patch I can confirm that the assertion problem has
> been fixed (after recreating the checkpoint).
>
> My problems with the O3CPU persist, and was wondering if this is a
> problem specific to X86 or is it a general problem?
>
> -- Marco
>
> On 28/08/12 21:28, Nilay Vaish wrote:
> > The cause of the assert failure was tracked down recently by Jason
> > Power. The patch is on the review board. Here is the link -
> > http://reviews.gem5.org/r/1365
> >
> > It will be committed to the mainline soon.
> >
> > --
> > Nilay
> >
> >
> > On Tue, 28 Aug 2012, Marco Elver wrote:
> >
> >> Hi all,
> >>
> >> I would like to ask if what I am trying to do is even possible (and if
> >> so, how??), as I have been running into a few problems, despite
> >> following the advice I could find in older mailing-list threads or the
> >> wiki. My goal would be to run a full-system with ruby (with
> >> MOESI_CMP_directory), multiple processors of type O3CPU and the X86 ISA;
> >> I create a snapshot after the Linux kernel loaded and before the
> >> benchmark enters the ROI.
> >>
> >> With revision 9174:2171e04a2ee5 (Mon Aug 27 20:53:20 2012 -0400) from
> >> the dev repository, I tried the following:
> >>    (1) Take a checkpoint with ruby_fs, the *MOESI_hammer* protocol
> >> (only one supporting checkpoints, according to Wiki) and the
> >> TimingSimpleCPU (succeeds):
> >>           $> build/X86_MOESI_hammer/gem5.opt
> >> --outdir=m5out/rawdata/fluidanimate/ckpt configs/example/ruby_fs.py -n
> >> 16 --cpu-type=timing --kernel=system/x86_64-vmlinux-2.6.28.smp
> >> --checkpoint-dir=m5out/checkpoints/fluidanimate --max-checkpoints=1
> >> --script=contrib/initscripts/parsec/fluidanimate.sh
> >>
> >>    (2) Resume from the checkpoint with the O3CPU, restore with
> >> TimingSimpleCPU (fails):
> >>           $> build/X86_MOESI_hammer/gem5.opt
> >> --outdir=m5out/rawdata/fluidanimate/detailed configs/example/ruby_fs.py
> >> -n 16 --cpu-type=detailed --kernel=system/x86_64-vmlinux-2.6.28.smp
> >> --checkpoint-dir=m5out/checkpoints/fluidanimate -r 0
> >> --restore-with-cpu=timing
> >>           [...]
> >>           Switch at curTick count:10000
> >>           info: Entering event queue @ 0.  Starting simulation...
> >>           Runtime Error at MOESI_hammer-dir.sm:1270, Ruby Time:
> >> 1111185: assert failure, PID: 2742
> >>           press return to continue.
> >>
> >>           Program aborted at cycle 555592500
> >>
> >>    (3) Resume from the checkpoint with the TimingSimpleCPU fails in the
> >> same way as (2), as in (2) the CPU isn't even switched to the O3CPU
> >> before it fails.
> >>
> >>    (4) Though if I try taking a snapshot right after starting the
> >> simulator (after ~ 10000000000 cycles, kernel still booting) and then
> >> try to restore with the TimingSimpleCPU, it works as expected; only the
> >> O3CPU fails with a segfault and the following backtrace:
> >>        #0  0x0000000000cdff56 in MasterPort::sendTimingReq
> >> (this=<optimized out>, pkt=0x6f8a060)
> >>            at build/X86/mem/port.cc:136
> >>        #1  0x00000000005fbac5 in sendTiming (pkt=0x6f8a060,
> >> sendingState=0x61a7cc0, this=0x49a9e60)
> >>            at build/X86/arch/x86/pagetable_walker.cc:173
> >>        #2  X86ISA::Walker::WalkerState::sendPackets (this=0x61a7cc0)
> >>            at build/X86/arch/x86/pagetable_walker.cc:631
> >>        #3  0x00000000005fc8c2 in
> >> X86ISA::Walker::WalkerState::recvPacket (this=this@entry=0x61a7cc0,
> >>            pkt=pkt@entry=0x1e99920) at
> >> build/X86/arch/x86/pagetable_walker.cc:590
> >>        #4  0x00000000005fcb98 in X86ISA::Walker::recvTimingResp
> >> (this=0x43706c0, pkt=0x1e99920)
> >>            at build/X86/arch/x86/pagetable_walker.cc:129
> >>        #5  0x0000000000ce1f5b in PacketQueue::trySendTiming
> >> (this=0x42ba5e0)
> >>            at build/X86/mem/packet_queue.cc:152
> >>        #6  0x0000000000ce2929 in PacketQueue::sendDeferredPacket
> >> (this=0x42ba5e0)
> >>            at build/X86/mem/packet_queue.cc:190
> >>        #7  0x0000000000c391be in EventQueue::serviceOne
> >> (this=<optimized out>) at build/X86/sim/eventq.cc:204
> >>        #8  0x0000000000c7d342 in simulate
> >> (num_cycles=9223372036854785807) at build/X86/sim/simulate.cc:71
> >>        #9  0x0000000000b8e17c in _wrap_simulate__SWIG_0
> >> (args=<optimized out>)
> >>            at build/X86/python/swig/event_wrap.cc:4755
> >>        #10 _wrap_simulate (self=<optimized out>, args=<optimized out>)
> >>            at build/X86/python/swig/event_wrap.cc:4804
> >>        #11 0x00007fb32a094fc6 in PyEval_EvalFrameEx () from
> >> /lib/libpython2.7.so.1.0
> >>
> >> Trying to restore with ruby using MOESI_CMP_directory and the
> >> TimingSimpleCPU results in the same error as (2), with the difference
> >> that it finishes loading the checkpoint, resumes, but then fails after
> >> about a minute ("Runtime Error at MOESI_CMP_directory-dir.sm:485, Ruby
> >> Time: 12038425921: assert failure, PID: 19169"). Using the O3CPU still
> >> results in the same error as (4).
> >>
> >> In addition, I have seen workflows of: 1) create checkpoint without ruby
> >> and with the AtomicSimpleCPU 2) load checkpoint with ruby and the
> >> TimingSimpleCPU. I tried this, and it works if I set
> >> --restore-with-cpu=timing. But trying this with the O3CPU doesn't work,
> >> resulting in the same backtrace as (4).
> >>
> >> Is what I'm trying to do possible? If so, any workarounds I should
> >> know of?
> >>
> >> Thanks,
> >> Marco
> >>
> >>
> >> --
> >> The University of Edinburgh is a charitable body, registered in
> >> Scotland, with registration number SC005336.
> >>
> >> _______________________________________________
> >> gem5-users mailing list
> >> gem5-users@gem5.org
> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >>
> > _______________________________________________
> > gem5-users mailing list
> > gem5-users@gem5.org
> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>



-- 
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  Dept. of Computer Science, University of Texas - Austin
  http://www.cs.utexas.edu/~hestness
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to