Hi Jason, Thanks for your help. I think I've honed in on the source of the problem -- namely, number of cpus. Is there a reason why having multiple CPUs in a particular configuration would limit the simulator's ability to write a checkpoint?
Again, thank you for your help! Best, Sam On Wed, Sep 8, 2021 at 11:12 AM Jason Lowe-Power <ja...@lowepower.com> wrote: > Hi Sam, > > Sorry for the frustration. Writing better documentation is always #2 on > the priority list :(. > > I always tell people not to trust any of the "options" to fs.py and se.py. > Those scripts have gotten so far beyond "out of hand" at this point that > they are almost useless. They are trying to be everything to everyone, and > they end up just being a mess of spaghetti code and confusion. > > To take a checkpoint, you can add the following code to a python runscript: > > m5.simulate(10000) > m5.checkpoint(<name of directory>) > m5.simulate(20000) > m5.checkpoint(<name of directory>) > > I tested the above code by adding it to the > configs/learning_gem5/part1/two_level.py file. > > *Maybe* this is what --take-checkpoints is doing. It's certainly what it > was *supposed* to do, but again, since this code has gotten so out of hand, > who knows if it's actually doing what it advertises. > > If you want to use the m5ops to checkpoint, the code would look > something like the following (this isn't tested and it's off the top of my > head). > > while 1: > exit_event = m5.simulate() > if exit_event.getCause() == 'checkpoint'): > m5.checkpoint(m5.outdir + '/' + str(num)) > else: > break > > To restore from a checkpoint, pass the checkpoint directory as the only > parameter to m5.instantiate(ckpt_dir=<checkpoint directory>). > > Hope this helps! If you're still experiencing a hang in this case, it's > probably a bug in the drain() code somewhere. You can try to use one of the > drain debug flags (I don't know exactly what these are... check gem5 > --debug-help for a list of debug flags). Making the python runscript do > exactly what you expect will also help with debugging. When you control the > script, adding prints is easy, too! > > Finally, the file src/python/m5/simulate.py may be helpful to figure out > what's going on when instantiating, simulating, checkpointing, etc. > > Cheers, > Jason > > On Wed, Sep 8, 2021 at 6:14 AM Thomas, Samuel via gem5-users < > gem5-users@gem5.org> wrote: > >> Hi all, >> >> Just to follow up, because I can see that there have been some issues >> with not including all of the requisite issues in other threads, here is >> the full output from what I described above. >> >> gem5 Simulator System. http://gem5.org >> gem5 is copyrighted software; use the --copyright option for details. >> >> gem5 version 21.1.0.0 >> gem5 compiled Sep 7 2021 19:28:16 >> gem5 started Sep 8 2021 09:09:11 >> gem5 executing on sam-Precision-Tower-5810, pid 445665 >> command line: build/X86/gem5.opt -d $CURR_DIR/debug >> $CURR_DIR/configs/example/fs.py --caches --l2cache --mem-type DDR3_1600_8x8 >> --mem-size 2GB --meta-size 512kB --num-cpus 4 --disk-image $DISK_PATH >> --kernel $KERNEL_PATH --cpu-type $CPU_TYPE --script=$SCRIPT_PATH >> --l2_size=1MB --take-checkpoints=10000,20000 >> >> warn: iobus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: bridge.master is deprecated. `master` is now called `mem_side_port` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: bridge.slave is deprecated. `slave` is now called `cpu_side_port` >> warn: iobus.master is deprecated. `master` is now called `mem_side_ports` >> warn: apicbridge.slave is deprecated. `slave` is now called >> `cpu_side_port` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: apicbridge.master is deprecated. `master` is now called >> `mem_side_port` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: iobus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.master is deprecated. `master` is now called >> `mem_side_ports` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.master is deprecated. `master` is now called `mem_side_ports` >> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports` >> Global frequency set at 1000000000000 ticks per second >> warn: system.workload.acpi_description_table_pointer.rsdt adopting orphan >> SimObject param 'entries' >> [Detaching after fork from child process 445724] >> [Detaching after fork from child process 445725] >> build/X86/mem/mem_interface.cc:792: warn: DRAM device capacity (8192 >> Mbytes) does not match the address range assigned (2048 Mbytes) >> build/X86/sim/kernel_workload.cc:46: info: kernel located at: >> /home/sam/Desktop/clean-gem5/gem5/_dist/binaries/x86_64-vmlinux-2.6.22.9 >> 0: system.pc.south_bridge.cmos.rtc: Real-time clock set to Sun Jan >> 1 00:00:00 2012 >> system.pc.com_1.device: Listening for connections on port 3464 >> 0: system.remote_gdb: listening for remote gdb on port 7008 >> build/X86/dev/intel_8254_timer.cc:125: warn: Reading current count from >> inactive timer. >> **** REAL SIMULATION **** >> build/X86/sim/simulate.cc:107: info: Entering event queue @ 0. Starting >> simulation... >> Exiting @ tick 10000 because simulate() limit reached >> build/X86/sim/simulate.cc:107: info: Entering event queue @ 10000. >> Starting simulation... >> >> >> At this point, the program hangs, and occupies the ports until I manually >> reset it even after killing the terminal process. Does this sound like >> something anyone has seen before or can replicate? I feel like I'm going >> crazy, and am not even sure how to debug this... >> >> Best, >> Sam >> >> On Tue, Sep 7, 2021 at 9:56 AM Samuel Thomas <samuel_tho...@brown.edu> >> wrote: >> >>> Hi all, >>> >>> This is a very basic and perhaps silly question. I’m trying to take >>> checkpoints in a gem5 program so that I can debug a particular segment of >>> the execution more efficiently, but it seems that the flag seems to pause >>> the execution of the environment and not actually take any checkpoints. >>> >>> I’m currently working from commit >>> 87c121fd954ea5a6e6b0760d693a2e744c2200de (i.e., v21.1.0.0) >>> >>> And am running the following command line: >>> >>> build/X86/gem5.opt -d $CURR_DIR/debug $CURR_DIR/configs/example/fs.py >>> --caches --l2cache --mem-type DDR3_1600_8x8 --mem-size 2GB --meta-size >>> 512kB --num-cpus 4 --disk-image $DISK_PATH --kernel $KERNEL_PATH --cpu-type >>> $CPU_TYPE --script=$SCRIPT_PATH --l2_size=1MB --take-checkpoints=10000,20000 >>> >>> I assumed that --take-checkpoints was the proper way to do this, but it >>> seems that the execution pauses at this point and no checkpoint files are >>> produced in the out directory. Is there something that I’m doing wrong or a >>> better way to go about doing this? >>> >>> Thanks for your help! >>> >>> Best, >>> Sam >> >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-le...@gem5.org >> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s