Hi, Has anyone been able to reproduce this issue?
Thanks, Ivan On Sat, May 17, 2014 at 1:50 AM, Ivan Stalev <ids...@psu.edu> wrote: > Hi Joel, > > I am using revision 10124. I removed all of my own modifications just to > be safe. > > Running with gem5.opt and restoring from a boot-up checkpoint > with--debug-flag=Exec, it appears that the CPU is stuck in some sort of > infinite loop, executing this continuously: > > 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.0 : > CMP_M_I : limm t2d, 0 : IntAlu : D=0x0000000000000000 > 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.1 : > CMP_M_I : ld t1d, DS:[rdi] : MemRead : D=0x00000000fffffffe > A=0xffffffff80822400 > 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.2 : > CMP_M_I : sub t0d, t1d, t2d : IntAlu : D=0x0000000000000000 > 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.0 : > JLE_I : rdip t1, %ctrl153, : IntAlu : D=0xffffffff80596897 > 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.1 : > JLE_I : limm t2, 0xfffffffffffffff9 : IntAlu : D=0xfffffffffffffff9 > 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.2 : > JLE_I : wrip , t1, t2 : IntAlu : > 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+16 : NOP > : IntAlu : > 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.0 : > CMP_M_I : limm t2d, 0 : IntAlu : D=0x0000000000000000 > 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.1 : > CMP_M_I : ld t1d, DS:[rdi] : MemRead : D=0x00000000fffffffe > A=0xffffffff80822400 > 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.2 : > CMP_M_I : sub t0d, t1d, t2d : IntAlu : D=0x0000000000000000 > 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.0 : > JLE_I : rdip t1, %ctrl153, : IntAlu : D=0xffffffff80596897 > 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.1 : > JLE_I : limm t2, 0xfffffffffffffff9 : IntAlu : D=0xfffffffffffffff9 > 5268959012000: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.2 : > JLE_I : wrip , t1, t2 : IntAlu : > 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+16 : NOP > : IntAlu : > 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+18.0 : > CMP_M_I : limm t2d, 0 : IntAlu : D=0x0000000000000000 > 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+18.1 : > CMP_M_I : ld t1d, DS:[rdi] : MemRead : D=0x00000000fffffffe > A=0xffffffff80822400 > 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+18.2 : > CMP_M_I : sub t0d, t1d, t2d : IntAlu : D=0x0000000000000000 > 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.0 : > JLE_I : rdip t1, %ctrl153, : IntAlu : D=0xffffffff80596897 > 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.1 : > JLE_I : limm t2, 0xfffffffffffffff9 : IntAlu : D=0xfffffffffffffff9 > 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.2 : > JLE_I : wrip , t1, t2 : IntAlu : > 5268959013000: system.switch_cpus1 T0 : @_spin_lock_irqsave+16 : NOP > : IntAlu : > > ....and so on repetitively without stopping. > > Using --debug-flag=LocalApic, the output does indeed stop shortly after > restoring from the checkpoint. The last output is: > .. > 5269570990500: system.cpu1.interrupts: Reported pending regular interrupt. > 5269570990500: system.cpu1.interrupts: Reported pending regular interrupt. > 5269570990500: system.cpu1.interrupts: Generated regular interrupt fault > object. > 5269570990500: system.cpu1.interrupts: Reported pending regular interrupt. > 5269570990500: system.cpu1.interrupts: Interrupt 239 sent to core. > 5269571169000: system.cpu1.interrupts: Writing Local APIC register 5 at > offset 0xb0 as 0. > > ...and no more output from this point on. > > I appreciate your help tremendously. > > Ivan > > > > On Fri, May 16, 2014 at 11:18 AM, Joel Hestness <jthestn...@gmail.com>wrote: > >> Hi Ivan, >> I believe that the email thread you previously referenced was related >> to a bug that we identified and fixed with changeset >> 9624<http://permalink.gmane.org/gmane.comp.emulators.m5.devel/19326>. >> That bug was causing interrupts to be dropped in x86 when running with the >> O3 CPU. Are you using a version of gem5 after that changeset? If not, I'd >> recommend updating to a more recent version and trying to replicate this >> issue again. >> >> If you are using a more recent version of gem5, first, please let us >> know which changeset and whether you've made any changes. Then, I'd >> recommend compiling gem5.opt and using the DPRINTF tracing functionality to >> see if you can zero in on what is happening. Specifically, first try >> passing the flag --debug-flag=Exec to look at what the CPU cores are >> executing (you may also want to pass --trace-start=<<tick>> with a >> simulator tick time close to when the hang happens). This trace will >> include Linux kernel symbols for at least some of the lines if executing in >> the kernel (e.g. handling an interrupt). If you've compiled your benchmark >> without debugging symbols, it may just show the memory addresses of >> instructions being executed within the application. I will guess that >> you'll see kernel symbols for at least some of the executed instructions >> for interrupts. >> >> If it appears that the CPUs are continuing to execute, try running with >> --debug-flag=LocalApic. This will print the interrupts that each core is >> receiving, and if it stops printing at any point, it means something has >> gone wrong and we'd have to do some deeper digging. >> >> Keep us posted on what you find, >> Joel >> >> >> >> On Thu, May 15, 2014 at 11:16 PM, Ivan Stalev <ids...@psu.edu> wrote: >> >>> Hi Joel, >>> >>> I have tried several different kernels and disk images, including the >>> default ones provided on the GEM5 website in the x86-system.tar.bz2 >>> download. I run with these commands: >>> >>> build/X86/gem5.fast -d m5out/test_run configs/example/fs.py >>> --kernel=/home/mdl/ids103/full_system_images/binaries/x86_64-vmlinux-2.6.22.9.smp >>> -n 2 --mem-size=4GB --cpu-type=atomic --cpu-clock=2GHz >>> --script=rcs_scripts/run.rcS --max-checkpoints=1 >>> >>> My run.rcS script simply contains: >>> >>> #!/bin/sh >>> /sbin/m5 resetstats >>> /sbin/m5 checkpoint >>> echo 'booted' >>> /extras/run >>> /sbin/m5 exit >>> >>> where "/extras/run" is simply a C program with an infinite loop that >>> prints a counter. >>> >>> I then restore: >>> >>> build/X86/gem5.fast -d m5out/test_run configs/example/fs.py >>> --kernel=/home/mdl/ids103/full_system_images/binaries/x86_64-vmlinux-2.6.22.9.smp >>> -r 1 -n 2 --mem-size=4GB --cpu-type=detailed --cpu-clock=2GHz --caches >>> --l2cache --num-l2caches=1 --l1d_size=32kB --l1i_size=32kB --l1d_assoc=4 >>> --l1i_assoc=4 --l2_size=4MB --l2_assoc=8 --cacheline_size=64 >>> >>> I specified the disk image file in Benchmarks.py. Restoring from the >>> same checkpoint and running in atomic mode works fine. I also tried booting >>> the system in detailed and letting it run for a while, but once it boots, >>> there is no more output. So it seems that checkpointing is not the issue. >>> The "run" program is just a dummy, and the same issue also persists when >>> running SPEC benchmarks or any other program. >>> >>> My dummy program is simply: >>> >>> int count=0; >>> printf("**************************** HEYY \n"); >>> while(1) >>> printf("\n %d \n", count++); >>> >>> Letting it run for a while, the only output is exactly this: >>> >>> booted >>> ******* >>> >>> It doesn't even finish printing the first printf. >>> >>> Another thing to add: In another scenario, I modified the kernel to call >>> an m5 pseudo instruction on every context switch, and then GEM5 prints that >>> a context switch occurred. Once again, in atomic mode this worked as >>> expected. However, in detailed, even the GEM5 (printf inside GEM5 itself) >>> output stopped along with the system output in the terminal. >>> >>> Thank you for your help. >>> >>> Ivan >>> >>> >>> On Thu, May 15, 2014 at 10:51 PM, Joel Hestness <jthestn...@gmail.com>wrote: >>> >>>> Hi Ivan, >>>> Can you please give more detail on what you're running? >>>> Specifically, can you give your command line, and which kernel, disk image >>>> you're using? Are you using checkpointing? >>>> >>>> Joel >>>> >>>> >>>> On Mon, May 12, 2014 at 10:52 AM, Ivan Stalev via gem5-users < >>>> gem5-users@gem5.org> wrote: >>>> >>>>> Hello, >>>>> >>>>> I am running X86 in full system mode. When running just 1 CPU, both >>>>> atomic and detailed mode work fine. However, with more than 1 CPU, atomic >>>>> works fine, but in detailed mode the system appears to hang shortly after >>>>> boot-up. GEM5 doesn't crash, but the system stops having any output. >>>>> Looking at the stats, it appears that instructions are still being >>>>> committed, but the actual applications/benchmarks are not making progress. >>>>> The issue persists with the latest version of GEM5. I also tried two >>>>> different kernel versions and several different disk images. >>>>> >>>>> I might be experiencing what seems to be the same issue that was found >>>>> about a year ago but appears to not have been fixed: >>>>> https://www.mail-archive.com/gem5-dev@gem5.org/msg08839.html >>>>> >>>>> Can anyone reproduce this or know of a solution? >>>>> >>>>> Thank you, >>>>> >>>>> Ivan >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gem5-users mailing list >>>>> gem5-users@gem5.org >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>>> >>>> >>>> >>>> >>>> -- >>>> Joel Hestness >>>> PhD Student, Computer Architecture >>>> Dept. of Computer Science, University of Wisconsin - Madison >>>> http://pages.cs.wisc.edu/~hestness/ >>>> >>> >>> >> >> >> -- >> Joel Hestness >> PhD Student, Computer Architecture >> Dept. of Computer Science, University of Wisconsin - Madison >> http://pages.cs.wisc.edu/~hestness/ >> > >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users