Thanks for the input, Paul. No need to apologize for being pro-marss; different tools have different strengths and there's no point in someone using gem5 if it's not the best tool for the job. (It's not like we're losing revenue...) I'm also very interested in their perceived strengths---a little friendly competition keeps us all on our toes.
As far as multithreading, gem5 is also single-threaded so you'll typically see linear slowdown when modeling MP systems as well. We have plans to multithread the simulation engine, but much time has passed and they still remain mostly just plans, so don't hold your breath on that. Steve On Wed, Oct 24, 2012 at 8:43 AM, Paul Rosenfeld <dramnin...@gmail.com>wrote: > Another thing to note is that in the master branch of marss, you can > expect the slowdown for running more cores to be pretty much linear (not > sure if this is the case with gem5). QEMU emulates each core in sequence so > as you add cores, the simulation time goes up linearly. They do have some > experimental extensions for multithreading the core execution, but I'm not > sure how much speedup you can claw back (I haven't used it myself). > > Overall, I think the decision comes down to how flexible your modelling > needs are. If you need to run your experiment across multiple ISAs or on > some non-x86 ISA (or if you need the full coherence modeling power of > ruby), I'd say gem5 is your best bet. However, for x86 simulation, I > personally tend to lean towards marss -- it has the benefit of picking one > target and trying to do it well, which can make it much easier to > understand how to change the code. > > One more thing to consider is that if your simulation is device-centric > (hard disk, network card), you might want to find out the finer points of > how marss handles these things. IIRC, since QEMU handles device emulation, > it might be a bit difficult to get good simulation data on the effects of > things like NICs and disks without doing some work first. > > Also, to comment on Steve's point about the level of CPU model detail > being the same, that is also another difference between marss and gem5: > there isn't really a way to do a functional simulation in marss -- you're > pretty much always stuck with the full-on detailed model. They have an out > of order model which models the full superscalar out of order pipeline and > they have a simple "intel Atom-like" model which is much simpler (in > order), but that's pretty much the only knob you get in terms of detail. > > I'd agree with Steve's point about the boot time being a smaller issue > since both simulators support the "checkpoint at the start of the > simulation" option. That said, I found myself screwing around with the > actual disk images and benchmarks more than I thought I'd have to (mostly > in things like tweaking the parameters to benchmarks, trying to write new > micro benchmarks that would inevitably end up doing something incorrectly > and I'd have to recompile them and re-checkpoint them). > > I hope I don't sound like I'm a marss cheerleader, but since you asked > this question on the gem5 list, I feel like someone should try to balance > out the picture a bit. > > -Paul > > > > On Wed, Oct 24, 2012 at 11:18 AM, Steve Reinhardt <ste...@gmail.com>wrote: > >> Thanks for the benchmarking effort, Ben. These are interesting numbers, >> but before people read too much into them I thought I'd throw out some >> caveats: >> >> - A much better way to measure slowdown is to compare the execution time >> in the simulator with the execution time on a real system. The reported >> simulated runtime (i.e., what you're getting from running 'time' in the >> simulator itself) reflects whatever configuration you're modeling, which >> may or may not be realistic (and if you're not running a detailed timing >> model, it's unlikely to be realistic). That is, the wall-clock simulation >> runtime is going to be the same whether I configure the simulated CPU to >> run at a simulated 2 GHz or 2 kHz, but the slowdown/speedup as you've >> calculated it would be very different. >> >> - OS boot speed is a useful number to have, but not a representative >> workload for looking at typical simulation jobs. Generally when people use >> FS mode in gem5 they boot the OS once, take a checkpoint, and run their >> simulations from there. Also, though I haven't run FS mode myself >> recently, 23 minutes sounds extremely slow; my recollection is that boot is >> pretty fast (just a few minutes). Part of that is also that we typically >> boot a more stripped-down image and not a full install (which is typically >> unnecessary for benchmarking). Also, there are delay loops that we skip >> that might not be properly skipped if you're using a different kernel image. >> >> - You need to make sure that the level of detail of the simulation model >> is the same in both cases, and probably do a comparison at multiple levels >> (e.g., fast functional simulation vs. detailed out-of-order CPU and caches). >> >> I don't mean to sound overly critical or like I'm making excuses... I >> expect MARSS probably is faster than gem5, particularly for fast functional >> simulation, because they seem to focus a lot on speed while we focus more >> on flexibility and modularity. (Though there has been some work on using >> KVM to provide extremely fast functional modeling for gem5, which should >> make up a lot of the difference for that mode of operation.) I just want >> to make sure that the comparisons are fair and meaningful. >> >> Thanks, >> >> Steve >> >> On Wed, Oct 24, 2012 at 7:46 AM, Payne, Benjamin <bpa...@lps.umd.edu>wrote: >> >>> Prompted by Hamid's question about simulation speed comparison with >>> MARSS, I wrote a small benchmark (see bottom of this email), then compiled >>> and ran it within the gem5 full system emulation using the disk image >>> http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2 >>> The gem5 configuration is with all the defaults, >>> build/ARM/gem5.opt configs/example/fs.py >>> --disk-image=/home/bpayne/full_system_for_gem5/disks/arm-ubuntu-natty-headless.img >>> >>> The boot time for full simulation mode (how long until I'm at the login >>> terminal via telnet) is 23 minutes. >>> >>> In full simulation mode, I see the following output (my binary is called >>> "a.out") >>> >>> root@gem5sim:~# date; time ./a.out; date >>> date; time ./a.out; date >>> Wed Dec 31 20:49:26 CST 1969 >>> CPU time= 0.210000 seconds >>> real 0m0.216s >>> user 0m0.060s >>> sys 0m0.150s >>> Wed Dec 31 20:49:27 CST 1969 >>> root@gem5sim:~# >>> >>> The wall clock time (how long I wait for the simulated system) is about >>> 4 minutes. Thus the slowdown is a factor of (4*60)/.2=1200, which is >>> consistent with previous runs I've done. >>> >>> Next I ran the same code in syscall emulation mode, cross compiled using >>> Linaro for ARM. This took 168 seconds of wall clock time and 0.07 seconds >>> of simulated time, a ratio of (2*60+48)/0.07=2400 [twice as fast as full >>> system emulation!]. I repeated the same measure with bench.c cross-compiled >>> for ARM using Mentor Graphics Sourcery Tools. The syscall emulation took >>> 162 wall clock seconds and 0.06 simulation seconds, a ratio of 2700. [These >>> numbers may be somewhat inaccurate due to the low simulation time.] Below >>> is how I captured the times in syscall emulation mode. >>> >>> bpayne@bpayne-VirtualBox64:~/gem5$ date; time build/ARM/gem5.opt >>> configs/example/se.py -c >>> tests/test-progs/bens_benchmark/bin/arm/bench_linaro.lex ; date >>> Wed Oct 24 08:24:13 EDT 2012 >>> gem5 Simulator System. http://gem5.org >>> gem5 is copyrighted software; use the --copyright option for details. >>> gem5 compiled Oct 16 2012 13:57:10 >>> gem5 started Oct 24 2012 08:24:13 >>> gem5 executing on bpayne-VirtualBox64 >>> command line: build/ARM/gem5.opt configs/example/se.py -c >>> tests/test-progs/bens_benchmark/bin/arm/readwrite_linaro.lex >>> Global frequency set at 1000000000000 ticks per second >>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 >>> **** REAL SIMULATION **** >>> info: Entering event queue @ 0. Starting simulation... >>> CPU time= 0.070000 seconds >>> hack: be nice to actually delete the event here >>> Exiting @ tick 73563381000 because target called exit() >>> real 2m48.492s >>> user 1m47.319s >>> sys 0m2.792s >>> Wed Oct 24 08:27:01 EDT 2012 >>> bpayne@bpayne-VirtualBox64:~/gem5$ >>> >>> ************************************** >>> >>> Next I ran the same bench.c code in MARSS using the system image >>> http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2 >>> >>> The boot time for full simulation mode of MARSS (how long until I'm at >>> the login terminal via VNC) is 42 seconds (33 times faster than gem5). >>> I compiled a static binary of bench.c and ran it in MARSS: >>> >>> root@ubuntu:~# date; time ./bench.lex ; date >>> Wed Oct 24 14:38:25 UTC 2012 >>> CPU time= 8.66 seconds >>> real 0m8.752s >>> user 0m1.200s >>> sys 0m7.490s >>> Wed Oct 24 14:38:34 UTC 2012 >>> root@ubuntu:~# >>> >>> The wall clock time for this simulation is roughly 9 seconds. The CPUs >>> are different, so it doesn't make sense to compare MARSS's 8.66 seconds to >>> gem5's 0.07 seconds. What is relevant is the slowdown factor -- 1 for >>> MARSS, between 1200 and 2700 for gem5. >>> >>> ************************************** >>> >>> All of these timings were carried out in Ubuntu 12.04 64bit running in a >>> single CPU VirtualBox, hosted by Ubuntu 12.04 64bit with Intel Core i7 930 >>> @ 2.80 Ghz. The host system has 6GB of RAM, and the VirtualBox has 2GB. >>> >>> "bench.c" is a program to load the CPU and file I/O >>> >>> /* benchmark >>> * 20121018 >>> * Ben Payne >>> * load CPU and file I/O >>> */ >>> >>> #include <stdio.h> >>> #include <time.h> >>> main() >>> { >>> int number_of_computes; >>> int number_of_read_writes; >>> int number_of_iterations; >>> int iteration_indx; >>> int read_write_indx; >>> int compute_indx; >>> int valu; >>> int temp_read; >>> clock_t time_start, time_end; >>> double cpuTime; >>> FILE *outfile; >>> FILE *infile; >>> time_start = clock(); >>> number_of_computes=500; >>> number_of_read_writes=100; >>> number_of_iterations=100; >>> >>> for (iteration_indx = 1; iteration_indx <= number_of_iterations ; >>> iteration_indx++) >>> { >>> for (read_write_indx = 1; read_write_indx <= number_of_read_writes ; >>> read_write_indx++) >>> { >>> outfile = fopen("out.dat","a+"); /* apend file (add text to a file >>> or create a file if it does not exist.*/ >>> fprintf(outfile,"%u\n",read_write_indx); /*writes*/ >>> fclose(outfile); >>> for (compute_indx = 1; compute_indx <= number_of_computes ; >>> compute_indx++) >>> { >>> valu=(compute_indx+1)*23; >>> } >>> infile = fopen("out.dat","r"); >>> fscanf(infile,"%d",&temp_read); >>> fclose(infile); >>> } >>> } >>> time_end = clock(); >>> cpuTime= ((double)(time_end-time_start))/ (CLOCKS_PER_SEC); >>> printf("CPU time= %f seconds\n",cpuTime); >>> return 0; >>> } >>> >>> >>> >>> >>> From: gem5-users-boun...@gem5.org [mailto:gem5-users-boun...@gem5.org] >>> On Behalf Of Hamid Reza Khaleghzadeh >>> Sent: Tuesday, October 23, 2012 10:26 AM >>> To: gem5 users mailing list >>> Subject: Re: [gem5-users] gem5 versus MARSS >>> >>> Thanks for your answer. Ruby is a module in GEM5 which simulate memory >>> hierarchy. Suppose there is an application that its execution time is 20 ms >>> on a real system. GEM5 simulate the application in about 15 min. Hos is >>> MARSS86 simulation speed? >>> On Tue, Oct 23, 2012 at 5:27 PM, Payne, Benjamin <bpa...@lps.umd.edu> >>> wrote: >>> Hello, >>> >>> I'm not familiar with what you are referring to by the ruby module - is >>> that an addon for Gem5? >>> >>> You have a good question, but how would I quantify the difference in >>> simulation speeds between MARSS and Gem5? Is there an established benchmark >>> to run? >>> >>> Kindly, >>> >>> >>> Ben Payne >>> >>> From: gem5-users-boun...@gem5.org [mailto:gem5-users-boun...@gem5.org] >>> On Behalf Of Hamid Reza Khaleghzadeh >>> Sent: Tuesday, October 23, 2012 9:31 AM >>> To: gem5 users mailing list >>> Subject: Re: [gem5-users] gem5 versus MARSS >>> >>> I have a question about MARSS. As you know GEM5 simulation speed with >>> ruby module is very slow. May I know MARSS simulation speed? >>> >>> Thanks >>> On Tue, Oct 23, 2012 at 2:26 AM, Andreas Hansson < >>> andreas.hans...@arm.com> wrote: >>> Hi Benjamin, >>> >>> The list is long.gem5 has (amongst other things): >>> >>> a variety of CPU models that are orthogonal to the ISA, atomic for >>> speed, in order and O3 for details uarch models >>> >>> BSD license (thus both academia and companies involved and contributing) >>> >>> full-system ready-to-run Android disk images and configurations, not >>> just your average chip-multi-processor, but also heterogeneous >>> application-processor-like systems with state-of-the-art CPU models >>> >>> a very active (and large) user community >>> >>> >>> Ultimately using one or the other really depends on what problem it is >>> you want to address. >>> >>> Andreas >>> >>> From: <Payne>, Benjamin <bpa...@lps.umd.edu<mailto:bpa...@lps.umd.edu>> >>> Reply-To: gem5 users mailing list <gem5-users@gem5.org<mailto: >>> gem5-users@gem5.org>> >>> Date: Monday, 22 October 2012 22:06 >>> To: "gem5-users@gem5.org<mailto:gem5-users@gem5.org>" < >>> gem5-users@gem5.org<mailto:gem5-users@gem5.org>> >>> Subject: [gem5-users] gem5 versus MARSS >>> >>> Hello, >>> >>> What is the difference between gem5 >>> http://gem5.org/Main_Page >>> and MARSS (Micro-ARchitectural and System Simulator for x86-based >>> Systems) >>> http://marss86.org/~marss86/index.php/Home >>> >>> As far as I can tell, >>> -gem5 can support Alpha, ARM, SPARC, and x86 instruction set >>> architectures, whereas MARSS is only for x86. >>> -gem5 can be integrated into Structural Simulation Toolkit, whereas >>> MARSS has not been >>> -both gem5 and MARSS can simulate multiple cores >>> -both gem5 and MARSS can use DRAMSim2 >>> >>> Please correct me if any of these statements are incorrect. >>> >>> Are there any other considerations? >>> >>> Thank you, >>> >>> >>> Ben Payne >>> http://mst.edu/~bhpxc9/ >>> Suite 450, Room S452 >>> 5520 Research Park Drive >>> Catonsville, MD 21228-4870 >>> Laboratory for Physical Sciences >>> http://www.lps.umd.edu/ >>> office: 443-654-7890 >>> cell: 608-308-2413 >>> -- IMPORTANT NOTICE: The contents of this email and any attachments are >>> confidential and may also be privileged. If you are not the intended >>> recipient, please notify the sender immediately and do not disclose the >>> contents to any other person, use it for any purpose, or store or copy the >>> information in any medium. Thank you. >>> >>> _______________________________________________ >>> gem5-users mailing list >>> gem5-users@gem5.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>> >>> >>> >>> -- >>> Hamid Reza Khaleghzadeh >>> http://hkhaleghzadeh.webs.com >>> >>> >>> _______________________________________________ >>> gem5-users mailing list >>> gem5-users@gem5.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>> >>> >>> >>> -- >>> Hamid Reza Khaleghzadeh >>> http://hkhaleghzadeh.webs.com >>> >>> >>> _______________________________________________ >>> gem5-users mailing list >>> gem5-users@gem5.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>> >> >> >> _______________________________________________ >> gem5-users mailing list >> gem5-users@gem5.org >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users