Hey Gutierrez, "*sync* the disk image", do you mean making sure all disk modifications are actually made on the disk (update to date) before taking the checkpoint? How to do that? I haven't tried to take a checkpoint with COW layer disabled and then restart from that checkpoint before. All I have done is "ctrl+c" to stop gem5 to take the checkpoint (--checkpoint-at-end); I rely on gem5 to take care of all things that need to be checked when taking checkpoints.
Best, Da Zhang On Thu, Jul 19, 2018 at 2:36 PM Gutierrez, Anthony < [email protected]> wrote: > JIT was precisely the issue I was thinking was causing this. One thing may > be necessary, that is to ensure you *sync* the disk image before taking > your checkpoint. > > > > gem5’s debug flags should help you identify something like a hang, for > example an ExecAll trace. A SyscallAll trace would most likely help you > understand better what the JIT is doing. > > > > *From:* gem5-users <[email protected]> *On Behalf Of *Da Zhang > *Sent:* Thursday, July 19, 2018 11:15 AM > *To:* gem5 users mailing list <[email protected]> > *Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters > "SIGSEGV" and "null exception" during timing mode (fs mode) after > restarting from a checkpoint > > > > Thanks for the suggestions. > > I have been trying a couple of solutions (I only test for a small subset > of decapo benchmark suite, which encounters segfault with O3CPU): > > > > 1. using TimingSimpleCPU: no segfaults > > 2. disable COW layer and write on the disk image when taking checkpoint: > there are still segfaults > > 3. take checkpoints with JIT compiler disabled (20x slowdown): no segfaults > > 4. take checkpoints during atomic mode (without warming up JIT): no > segfaults > > 5. take checkpoints with Java OOPs compress disabled: there are still > segfaults > > > > One thing that I can't tell is if the benchmark hangs since there is no > printing during the execution. Is there a statistic I can use to tell if > the benchmark hangs? > > > > So far, all my experiments are running using 1CPU (even some benchmarks > are multithreading). I attempted to take some checkpoints with more CPUs > with KVM CPU. But unfortunately, I got some "rcu_sched self-detected stall > on CPU" issues. Any idea? > > > > On Mon, Jul 16, 2018 at 5:47 PM Gutierrez, Anthony < > [email protected]> wrote: > > Da, > > > > Do you encounter the segfault only when restoring from a checkpoint? That > is, if you do not use checkpoints can any DaCapo benchmark successfully > complete under one of the simple CPU models (and not just KVM CPU)? > > > > If so, you may want to get a syscall trace (e.g., using strace) to see > what sorts of files the JVM is trying to read etc. It’s possible that the > VM generates some files that it will read back later. If you use > checkpoints, due to the disk image COW layer, I do not believe any disk > updates are checkpointed, thus these files will not persist, which could > lead to some weird segfault issues. Not sure if this is happening in your > case, but it may be worth investigating. > > > > I created some of the original Android disk images, and the original > DaCapo image, and at that time I would typically run the benchmarks thru > the FS mode and Atomic CPU once, with the COW layer disabled, in order to > generate the needed files on the disk image and have them persist. This was > entirely for performance, however, to prevent the VMs from regenerating the > same files for each run, but I can envision it causing issues during > runtime as well. In particular, it seems you’re code is faulting while > doing some XML serializing/deserializing, perhaps the xml file it is > looking for is gone? > > > > Beyond that, assuming it is a real bug in gem5, I would recommend an > ExecAll trace to figure out why the instruction at that PC is faulting. > > > > -Tony > > > > *From:* gem5-users [mailto:[email protected]] *On Behalf Of *Da > Zhang > *Sent:* Monday, July 16, 2018 1:50 PM > *To:* gem5 users mailing list <[email protected]> > *Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters > "SIGSEGV" and "null exception" during timing mode (fs mode) after > restarting from a checkpoint > > > > Hey Jason, > > > > There are a bunch of "warn: instruction 'prefetch_nta' unimplemented" in > atomic modes, during which the java benchmarks don't crash. However, there > is no these kind of warnings during timing mode. Does it imply that > unimplemented instructions don't cause the problem? Any clues or > suggestions to debug these problems? > > > > best, > > Da Zhang > > > > > > > > On Mon, Jul 16, 2018 at 1:32 PM Jason Lowe-Power <[email protected]> > wrote: > > Hello, > > > > Are you seeing any warnings like "warn: Instruction XXX not implemented"? > > > > There are many X86 SIMD instructions that are currently unimplemented. I > would bet that your application is using some of those instructions and > getting 0's as the output instead of the correct value. > > > > The "right" way to solve this problem is to implement these instructions > (and we would really appreciate it if you contribute your fixes back on > https://gem5-review.googlesource.com. The other option is to recompile > your applications without SIMD extensions (e.g., -march=athlon64 or > whatever is the original x86-64 name in GCC). However, this likely requires > compiling all of the java runtime in your case. > > > > Cheers, > > Jason > > > > On Mon, Jul 16, 2018 at 10:11 AM Da Zhang <[email protected]> wrote: > > To clarify, "SIGSEGV and null exceptions " happens to the benchmark > suite, not gem5. Gem5 is running without errors. But in the > system.pc.com_1.device files, I observe that most of the benchmarks crash > due to SIGSEGV or null exceptions. > > Example: > > " > > x/system.pc.com_1.device > > > > buffers > > 1 # > > 2 # A fatal error has been detected by the Java Runtime Environment: > > 3 # > > 4 # SIGSEGV (0xb) at pc=0x00007f81d17742b7, pid=1474, > tid=0x00007f81cf46d700 > > 5 # > > 6 # JRE version: Java(TM) SE Runtime Environment (8.0_171-b11) (build > 1.8.0_171-b11) > > 7 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode > linux-amd64 compressed oops) > > 8 # Problematic frame: > > 9 # J 1815 C2 > org.apache.xml.serializer.ToHTMLStream.endElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V > (389 bytes) @ 0x00007f81d17742b7 [0x00007f81d1774280+0x37] > > > > 10 # > > 11 # > > " > > > > On Mon, Jul 16, 2018 at 11:39 AM Da Zhang <[email protected]> wrote: > > Hey guys, > > > > I am testing a java benchmark suite, dacapo, on gem5 with fs mode. > Unfortunately, I encounter a lot of SIGSEGV and null exceptions during > timing mode after restarting from the checkpoints. > > I am using linux kernel v4.8.13 and ubuntu-server-16.04.1 with > oracle jdk v8.0_171-b11. To eliminate the influence of my modifications to > gem5 src/ and configs/, I re-download gem5 and checkout to commit > "ee2ffdc0fdb489767768e5273a4ccd7b51735c7c", which is the gem5 version I am > working on. The checkpoint was taken by using kvm cpu with 1 CPU and 16GB > memory. For the simulation, I use build/X86/gem5.opt (in order to enable > assertions) with fs mode (configs/example/fs.py). Other options include > "--cpu-type=DerivO3CPU -n 1 --mem-size=16GB --caches --l2cache > --l2_size=${L2SIZE}" (I try L2SIZE from 256KB to 8MB). I test with 100ms > warmup and 1ps real simulation time. There are no errors presented. But > with longer real simulation time, the benchmark suite crashes with > segfault. > > I am able to run the dacapo benchmark suite in fs mode with kvm cpu, > without any segfaults or exceptions. I have some simple java benchmarks > tested; neither segfaults nor exceptions present. > > Does anyone have suggestions or experience against these issues? > > > > best, > > Da Zhang > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
