Yes, make sure all buffers are flushed, etc., before taking your checkpoint you can call the “sync” command, which should be already installed on the image. You’ll need to call sync before your commands to halt and take a checkpoint.
This page explains how I did the same for an Android disk image: http://gem5.org/BBench-gem5#Tips_for_Making_Your_Disk_Image_gem5_Friendly -Tony From: gem5-users <[email protected]> On Behalf Of Da Zhang Sent: Thursday, July 19, 2018 12:00 PM To: gem5 users mailing list <[email protected]> Subject: Re: [gem5-users] dacapo (java) benchmark suite encounters "SIGSEGV" and "null exception" during timing mode (fs mode) after restarting from a checkpoint Hey Gutierrez, "sync the disk image", do you mean making sure all disk modifications are actually made on the disk (update to date) before taking the checkpoint? How to do that? I haven't tried to take a checkpoint with COW layer disabled and then restart from that checkpoint before. All I have done is "ctrl+c" to stop gem5 to take the checkpoint (--checkpoint-at-end); I rely on gem5 to take care of all things that need to be checked when taking checkpoints. Best, Da Zhang On Thu, Jul 19, 2018 at 2:36 PM Gutierrez, Anthony <[email protected]<mailto:[email protected]>> wrote: JIT was precisely the issue I was thinking was causing this. One thing may be necessary, that is to ensure you sync the disk image before taking your checkpoint. gem5’s debug flags should help you identify something like a hang, for example an ExecAll trace. A SyscallAll trace would most likely help you understand better what the JIT is doing. From: gem5-users <[email protected]<mailto:[email protected]>> On Behalf Of Da Zhang Sent: Thursday, July 19, 2018 11:15 AM To: gem5 users mailing list <[email protected]<mailto:[email protected]>> Subject: Re: [gem5-users] dacapo (java) benchmark suite encounters "SIGSEGV" and "null exception" during timing mode (fs mode) after restarting from a checkpoint Thanks for the suggestions. I have been trying a couple of solutions (I only test for a small subset of decapo benchmark suite, which encounters segfault with O3CPU): 1. using TimingSimpleCPU: no segfaults 2. disable COW layer and write on the disk image when taking checkpoint: there are still segfaults 3. take checkpoints with JIT compiler disabled (20x slowdown): no segfaults 4. take checkpoints during atomic mode (without warming up JIT): no segfaults 5. take checkpoints with Java OOPs compress disabled: there are still segfaults One thing that I can't tell is if the benchmark hangs since there is no printing during the execution. Is there a statistic I can use to tell if the benchmark hangs? So far, all my experiments are running using 1CPU (even some benchmarks are multithreading). I attempted to take some checkpoints with more CPUs with KVM CPU. But unfortunately, I got some "rcu_sched self-detected stall on CPU" issues. Any idea? On Mon, Jul 16, 2018 at 5:47 PM Gutierrez, Anthony <[email protected]<mailto:[email protected]>> wrote: Da, Do you encounter the segfault only when restoring from a checkpoint? That is, if you do not use checkpoints can any DaCapo benchmark successfully complete under one of the simple CPU models (and not just KVM CPU)? If so, you may want to get a syscall trace (e.g., using strace) to see what sorts of files the JVM is trying to read etc. It’s possible that the VM generates some files that it will read back later. If you use checkpoints, due to the disk image COW layer, I do not believe any disk updates are checkpointed, thus these files will not persist, which could lead to some weird segfault issues. Not sure if this is happening in your case, but it may be worth investigating. I created some of the original Android disk images, and the original DaCapo image, and at that time I would typically run the benchmarks thru the FS mode and Atomic CPU once, with the COW layer disabled, in order to generate the needed files on the disk image and have them persist. This was entirely for performance, however, to prevent the VMs from regenerating the same files for each run, but I can envision it causing issues during runtime as well. In particular, it seems you’re code is faulting while doing some XML serializing/deserializing, perhaps the xml file it is looking for is gone? Beyond that, assuming it is a real bug in gem5, I would recommend an ExecAll trace to figure out why the instruction at that PC is faulting. -Tony From: gem5-users [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Da Zhang Sent: Monday, July 16, 2018 1:50 PM To: gem5 users mailing list <[email protected]<mailto:[email protected]>> Subject: Re: [gem5-users] dacapo (java) benchmark suite encounters "SIGSEGV" and "null exception" during timing mode (fs mode) after restarting from a checkpoint Hey Jason, There are a bunch of "warn: instruction 'prefetch_nta' unimplemented" in atomic modes, during which the java benchmarks don't crash. However, there is no these kind of warnings during timing mode. Does it imply that unimplemented instructions don't cause the problem? Any clues or suggestions to debug these problems? best, Da Zhang On Mon, Jul 16, 2018 at 1:32 PM Jason Lowe-Power <[email protected]<mailto:[email protected]>> wrote: Hello, Are you seeing any warnings like "warn: Instruction XXX not implemented"? There are many X86 SIMD instructions that are currently unimplemented. I would bet that your application is using some of those instructions and getting 0's as the output instead of the correct value. The "right" way to solve this problem is to implement these instructions (and we would really appreciate it if you contribute your fixes back on https://gem5-review.googlesource.com. The other option is to recompile your applications without SIMD extensions (e.g., -march=athlon64 or whatever is the original x86-64 name in GCC). However, this likely requires compiling all of the java runtime in your case. Cheers, Jason On Mon, Jul 16, 2018 at 10:11 AM Da Zhang <[email protected]<mailto:[email protected]>> wrote: To clarify, "SIGSEGV and null exceptions " happens to the benchmark suite, not gem5. Gem5 is running without errors. But in the system.pc.com_1.device files, I observe that most of the benchmarks crash due to SIGSEGV or null exceptions. Example: " x/system.pc.com_1.device buffers 1 # 2 # A fatal error has been detected by the Java Runtime Environment: 3 # 4 # SIGSEGV (0xb) at pc=0x00007f81d17742b7, pid=1474, tid=0x00007f81cf46d700 5 # 6 # JRE version: Java(TM) SE Runtime Environment (8.0_171-b11) (build 1.8.0_171-b11) 7 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 compressed oops) 8 # Problematic frame: 9 # J 1815 C2 org.apache.xml.serializer.ToHTMLStream.endElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V (389 bytes) @ 0x00007f81d17742b7 [0x00007f81d1774280+0x37] 10 # 11 # " On Mon, Jul 16, 2018 at 11:39 AM Da Zhang <[email protected]<mailto:[email protected]>> wrote: Hey guys, I am testing a java benchmark suite, dacapo, on gem5 with fs mode. Unfortunately, I encounter a lot of SIGSEGV and null exceptions during timing mode after restarting from the checkpoints. I am using linux kernel v4.8.13 and ubuntu-server-16.04.1 with oracle jdk v8.0_171-b11. To eliminate the influence of my modifications to gem5 src/ and configs/, I re-download gem5 and checkout to commit "ee2ffdc0fdb489767768e5273a4ccd7b51735c7c", which is the gem5 version I am working on. The checkpoint was taken by using kvm cpu with 1 CPU and 16GB memory. For the simulation, I use build/X86/gem5.opt (in order to enable assertions) with fs mode (configs/example/fs.py). Other options include "--cpu-type=DerivO3CPU -n 1 --mem-size=16GB --caches --l2cache --l2_size=${L2SIZE}" (I try L2SIZE from 256KB to 8MB). I test with 100ms warmup and 1ps real simulation time. There are no errors presented. But with longer real simulation time, the benchmark suite crashes with segfault. I am able to run the dacapo benchmark suite in fs mode with kvm cpu, without any segfaults or exceptions. I have some simple java benchmarks tested; neither segfaults nor exceptions present. Does anyone have suggestions or experience against these issues? best, Da Zhang _______________________________________________ gem5-users mailing list [email protected]<mailto:[email protected]> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users _______________________________________________ gem5-users mailing list [email protected]<mailto:[email protected]> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users _______________________________________________ gem5-users mailing list [email protected]<mailto:[email protected]> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users _______________________________________________ gem5-users mailing list [email protected]<mailto:[email protected]> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
