Hi Sam,

I would *guess* it's the draining code getting stuck in an infinite loop.
The draining code calls "drain" on all SimObjects in the system, and they
do their thing. Then, the drain code asks all SimObjects if they're done
draining. If not, it starts over and calls drain on all objects again. If
some object isn't draining properly or if there is some circular
dependence, there could be a "live lock" in this code. Just a guess, though.

Cheers,
Jason

On Wed, Sep 8, 2021 at 10:00 AM Thomas, Samuel <samuel_tho...@brown.edu>
wrote:

> Hi Jason,
>
> Thanks for your help. I think I've honed in on the source of the problem
> -- namely, number of cpus. Is there a reason why having multiple CPUs in a
> particular configuration would limit the simulator's ability to write a
> checkpoint?
>
> Again, thank you for your help!
>
> Best,
> Sam
>
> On Wed, Sep 8, 2021 at 11:12 AM Jason Lowe-Power <ja...@lowepower.com>
> wrote:
>
>> Hi Sam,
>>
>> Sorry for the frustration. Writing better documentation is always #2 on
>> the priority list :(.
>>
>> I always tell people not to trust any of the "options" to fs.py and
>> se.py. Those scripts have gotten so far beyond "out of hand" at this point
>> that they are almost useless. They are trying to be everything to everyone,
>> and they end up just being a mess of spaghetti code and confusion.
>>
>> To take a checkpoint, you can add the following code to a python
>> runscript:
>>
>> m5.simulate(10000)
>> m5.checkpoint(<name of directory>)
>> m5.simulate(20000)
>> m5.checkpoint(<name of directory>)
>>
>> I tested the above code by adding it to the
>> configs/learning_gem5/part1/two_level.py file.
>>
>> *Maybe* this is what --take-checkpoints is doing. It's certainly what it
>> was *supposed* to do, but again, since this code has gotten so out of hand,
>> who knows if it's actually doing what it advertises.
>>
>> If you want to use the m5ops to checkpoint, the code would look
>> something like the following (this isn't tested and it's off the top of my
>> head).
>>
>> while 1:
>>   exit_event = m5.simulate()
>>   if exit_event.getCause() == 'checkpoint'):
>>     m5.checkpoint(m5.outdir + '/' + str(num))
>>   else:
>>     break
>>
>> To restore from a checkpoint, pass the checkpoint directory as the only
>> parameter to m5.instantiate(ckpt_dir=<checkpoint directory>).
>>
>> Hope this helps! If you're still experiencing a hang in this case, it's
>> probably a bug in the drain() code somewhere. You can try to use one of the
>> drain debug flags (I don't know exactly what these are... check gem5
>> --debug-help for a list of debug flags). Making the python runscript do
>> exactly what you expect will also help with debugging. When you control the
>> script, adding prints is easy, too!
>>
>> Finally, the file src/python/m5/simulate.py may be helpful to figure out
>> what's going on when instantiating, simulating, checkpointing, etc.
>>
>> Cheers,
>> Jason
>>
>> On Wed, Sep 8, 2021 at 6:14 AM Thomas, Samuel via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hi all,
>>>
>>> Just to follow up, because I can see that there have been some issues
>>> with not including all of the requisite issues in other threads, here is
>>> the full output from what I described above.
>>>
>>> gem5 Simulator System.  http://gem5.org
>>> gem5 is copyrighted software; use the --copyright option for details.
>>>
>>> gem5 version 21.1.0.0
>>> gem5 compiled Sep  7 2021 19:28:16
>>> gem5 started Sep  8 2021 09:09:11
>>> gem5 executing on sam-Precision-Tower-5810, pid 445665
>>> command line: build/X86/gem5.opt -d $CURR_DIR/debug
>>> $CURR_DIR/configs/example/fs.py --caches --l2cache --mem-type DDR3_1600_8x8
>>> --mem-size 2GB --meta-size 512kB --num-cpus 4 --disk-image $DISK_PATH
>>> --kernel $KERNEL_PATH --cpu-type $CPU_TYPE --script=$SCRIPT_PATH
>>> --l2_size=1MB --take-checkpoints=10000,20000
>>>
>>> warn: iobus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: bridge.master is deprecated. `master` is now called `mem_side_port`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: bridge.slave is deprecated. `slave` is now called `cpu_side_port`
>>> warn: iobus.master is deprecated. `master` is now called `mem_side_ports`
>>> warn: apicbridge.slave is deprecated. `slave` is now called
>>> `cpu_side_port`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: apicbridge.master is deprecated. `master` is now called
>>> `mem_side_port`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: iobus.master is deprecated. `master` is now called `mem_side_ports`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: tol2bus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.master is deprecated. `master` is now called
>>> `mem_side_ports`
>>> warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
>>> Global frequency set at 1000000000000 ticks per second
>>> warn: system.workload.acpi_description_table_pointer.rsdt adopting
>>> orphan SimObject param 'entries'
>>> [Detaching after fork from child process 445724]
>>> [Detaching after fork from child process 445725]
>>> build/X86/mem/mem_interface.cc:792: warn: DRAM device capacity (8192
>>> Mbytes) does not match the address range assigned (2048 Mbytes)
>>> build/X86/sim/kernel_workload.cc:46: info: kernel located at:
>>> /home/sam/Desktop/clean-gem5/gem5/_dist/binaries/x86_64-vmlinux-2.6.22.9
>>>       0: system.pc.south_bridge.cmos.rtc: Real-time clock set to Sun Jan
>>>  1 00:00:00 2012
>>> system.pc.com_1.device: Listening for connections on port 3464
>>> 0: system.remote_gdb: listening for remote gdb on port 7008
>>> build/X86/dev/intel_8254_timer.cc:125: warn: Reading current count from
>>> inactive timer.
>>> **** REAL SIMULATION ****
>>> build/X86/sim/simulate.cc:107: info: Entering event queue @ 0.  Starting
>>> simulation...
>>> Exiting @ tick 10000 because simulate() limit reached
>>> build/X86/sim/simulate.cc:107: info: Entering event queue @ 10000.
>>> Starting simulation...
>>>
>>>
>>> At this point, the program hangs, and occupies the ports until I
>>> manually reset it even after killing the terminal process. Does this sound
>>> like something anyone has seen before or can replicate? I feel like I'm
>>> going crazy, and am not even sure how to debug this...
>>>
>>> Best,
>>> Sam
>>>
>>> On Tue, Sep 7, 2021 at 9:56 AM Samuel Thomas <samuel_tho...@brown.edu>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This is a very basic and perhaps silly question. I’m trying to take
>>>> checkpoints in a gem5 program so that I can debug a particular segment of
>>>> the execution more efficiently, but it seems that the flag seems to pause
>>>> the execution of the environment and not actually take any checkpoints.
>>>>
>>>> I’m currently working from commit
>>>> 87c121fd954ea5a6e6b0760d693a2e744c2200de (i.e., v21.1.0.0)
>>>>
>>>> And am running the following command line:
>>>>
>>>> build/X86/gem5.opt -d $CURR_DIR/debug $CURR_DIR/configs/example/fs.py
>>>> --caches --l2cache --mem-type DDR3_1600_8x8 --mem-size 2GB --meta-size
>>>> 512kB --num-cpus 4 --disk-image $DISK_PATH --kernel $KERNEL_PATH --cpu-type
>>>> $CPU_TYPE --script=$SCRIPT_PATH --l2_size=1MB 
>>>> --take-checkpoints=10000,20000
>>>>
>>>> I assumed that --take-checkpoints was the proper way to do this, but it
>>>> seems that the execution pauses at this point and no checkpoint files are
>>>> produced in the out directory. Is there something that I’m doing wrong or a
>>>> better way to go about doing this?
>>>>
>>>> Thanks for your help!
>>>>
>>>> Best,
>>>> Sam
>>>
>>> _______________________________________________
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to