Thanks to everyone for your helpful responses. I know the timing comparisons 
aren't that meaningful for measurement, but they are the initial response 
provided by a default setup of MARSS and gem5. (I haven't started moving 
towards simulating real systems yet, so I didn't bother tuning the 
configurations).

Ben

From: gem5-users-boun...@gem5.org [mailto:gem5-users-boun...@gem5.org] On 
Behalf Of Steve Reinhardt
Sent: Wednesday, October 24, 2012 12:14 PM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS

Thanks for the input, Paul.  No need to apologize for being pro-marss; 
different tools have different strengths and there's no point in someone using 
gem5 if it's not the best tool for the job.  (It's not like we're losing 
revenue...)  I'm also very interested in their perceived strengths---a little 
friendly competition keeps us all on our toes.

As far as multithreading, gem5 is also single-threaded so you'll typically see 
linear slowdown when modeling MP systems as well.  We have plans to multithread 
the simulation engine, but much time has passed and they still remain mostly 
just plans, so don't hold your breath on that.

Steve
On Wed, Oct 24, 2012 at 8:43 AM, Paul Rosenfeld 
<dramnin...@gmail.com<mailto:dramnin...@gmail.com>> wrote:
Another thing to note is that in the master branch of marss, you can expect the 
slowdown for running more cores to be pretty much linear (not sure if this is 
the case with gem5). QEMU emulates each core in sequence so as you add cores, 
the simulation time goes up linearly. They do have some experimental extensions 
for multithreading the core execution, but I'm not sure how much speedup you 
can claw back (I haven't used it myself).

Overall, I think the decision comes down to how flexible your modelling needs 
are. If you need to run your experiment across multiple ISAs or on some non-x86 
ISA (or if you need the full coherence modeling power of ruby), I'd say gem5 is 
your best bet. However, for x86 simulation, I personally tend to lean towards 
marss -- it has the benefit of picking one target and trying to do it well, 
which can make it much easier to understand how to change the code.

One more thing to consider is that if your simulation is device-centric (hard 
disk, network card), you might want to find out the finer points of how marss 
handles these things. IIRC, since QEMU handles device emulation, it might be a 
bit difficult to get good simulation data on the effects of things like NICs 
and disks without doing some work first.

Also, to comment on Steve's point about the level of CPU model detail being the 
same, that is also another difference between marss and gem5: there isn't 
really a way to do a functional simulation in marss -- you're pretty much 
always stuck with the full-on detailed model. They have an out of order model 
which models the full superscalar out of order pipeline and they have a simple 
"intel Atom-like" model which is much simpler (in order), but that's pretty 
much the only knob you get in terms of detail.

I'd agree with Steve's point about the boot time being a smaller issue since 
both simulators support the "checkpoint at the start of the simulation" option. 
That said, I found myself screwing around with the actual disk images and 
benchmarks more than I thought I'd have to (mostly in things like tweaking the 
parameters to benchmarks, trying to write new micro benchmarks that would 
inevitably end up doing something incorrectly and I'd have to recompile them 
and re-checkpoint them).

I hope I don't sound like I'm a marss cheerleader, but since you asked this 
question on the gem5 list, I feel like someone should try to balance out the 
picture a bit.

-Paul



On Wed, Oct 24, 2012 at 11:18 AM, Steve Reinhardt 
<ste...@gmail.com<mailto:ste...@gmail.com>> wrote:
Thanks for the benchmarking effort, Ben.  These are interesting numbers, but 
before people read too much into them I thought I'd throw out some caveats:

- A much better way to measure slowdown is to compare the execution time in the 
simulator with the execution time on a real system.  The reported simulated 
runtime (i.e., what you're getting from running 'time' in the simulator itself) 
reflects whatever configuration you're modeling, which may or may not be 
realistic (and if you're not running a detailed timing model, it's unlikely to 
be realistic).  That is, the wall-clock simulation runtime is going to be the 
same whether I configure the simulated CPU to run at a simulated 2 GHz or 2 
kHz, but the slowdown/speedup as you've calculated it would be very different.

- OS boot speed is a useful number to have, but not a representative workload 
for looking at typical simulation jobs.  Generally when people use FS mode in 
gem5 they boot the OS once, take a checkpoint, and run their simulations from 
there.  Also, though I haven't run FS mode myself recently, 23 minutes sounds 
extremely slow; my recollection is that boot is pretty fast (just a few 
minutes).  Part of that is also that we typically boot a more stripped-down 
image and not a full install (which is typically unnecessary for benchmarking). 
 Also, there are delay loops that we skip that might not be properly skipped if 
you're using a different kernel image.

- You need to make sure that the level of detail of the simulation model is the 
same in both cases, and probably do a comparison at multiple levels (e.g., fast 
functional simulation vs. detailed out-of-order CPU and caches).

I don't mean to sound overly critical or like I'm making excuses... I expect 
MARSS probably is faster than gem5, particularly for fast functional 
simulation, because they seem to focus a lot on speed while we focus more on 
flexibility and modularity.  (Though there has been some work on using KVM to 
provide extremely fast functional modeling for gem5, which should make up a lot 
of the difference for that mode of operation.)  I just want to make sure that 
the comparisons are fair and meaningful.

Thanks,

Steve

On Wed, Oct 24, 2012 at 7:46 AM, Payne, Benjamin 
<bpa...@lps.umd.edu<mailto:bpa...@lps.umd.edu>> wrote:
Prompted by Hamid's question about simulation speed comparison with MARSS, I 
wrote a small benchmark (see bottom of this email), then compiled and ran it 
within the gem5 full system emulation using the disk image
http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2
The gem5 configuration is with all the defaults,
build/ARM/gem5.opt configs/example/fs.py 
--disk-image=/home/bpayne/full_system_for_gem5/disks/arm-ubuntu-natty-headless.img

The boot time for full simulation mode (how long until I'm at the login 
terminal via telnet) is 23 minutes.

In full simulation mode, I see the following output (my binary is called 
"a.out")

root@gem5sim:~# date; time ./a.out; date
date; time ./a.out; date
Wed Dec 31 20:49:26 CST 1969
CPU time= 0.210000 seconds
real    0m0.216s
user    0m0.060s
sys     0m0.150s
Wed Dec 31 20:49:27 CST 1969
root@gem5sim:~#

The wall clock time (how long I wait for the simulated system) is about 4 
minutes. Thus the slowdown is a factor of (4*60)/.2=1200, which is consistent 
with previous runs I've done.

Next I ran the same code in syscall emulation mode, cross compiled using Linaro 
for ARM. This took 168 seconds of wall clock time and 0.07 seconds of simulated 
time, a ratio of (2*60+48)/0.07=2400 [twice as fast as full system emulation!]. 
I repeated the same measure with bench.c cross-compiled for ARM using Mentor 
Graphics Sourcery Tools. The syscall emulation took 162 wall clock seconds and 
0.06 simulation seconds, a ratio of 2700. [These numbers may be somewhat 
inaccurate due to the low simulation time.] Below is how I captured the times 
in syscall emulation mode.

bpayne@bpayne-VirtualBox64:~/gem5$ date; time build/ARM/gem5.opt 
configs/example/se.py -c  
tests/test-progs/bens_benchmark/bin/arm/bench_linaro.lex ; date
Wed Oct 24 08:24:13 EDT 2012
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Oct 16 2012 13:57:10
gem5 started Oct 24 2012 08:24:13
gem5 executing on bpayne-VirtualBox64
command line: build/ARM/gem5.opt configs/example/se.py -c 
tests/test-progs/bens_benchmark/bin/arm/readwrite_linaro.lex
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
info: Entering event queue @ 0.  Starting simulation...
CPU time= 0.070000 seconds
hack: be nice to actually delete the event here
Exiting @ tick 73563381000 because target called exit()
real    2m48.492s
user    1m47.319s
sys     0m2.792s
Wed Oct 24 08:27:01 EDT 2012
bpayne@bpayne-VirtualBox64:~/gem5$

**************************************

Next I ran the same bench.c code in MARSS using the system image
http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2

The boot time for full simulation mode of MARSS (how long until I'm at the 
login terminal via VNC) is 42 seconds (33 times faster than gem5).
I compiled a static binary of bench.c and ran it in MARSS:

root@ubuntu:~# date; time ./bench.lex ; date
Wed Oct 24 14:38:25 UTC 2012
CPU time= 8.66 seconds
real   0m8.752s
user  0m1.200s
sys 0m7.490s
Wed Oct 24 14:38:34 UTC 2012
root@ubuntu:~#

The wall  clock time for this simulation is roughly 9 seconds. The CPUs are 
different, so it doesn't make sense to compare MARSS's 8.66 seconds to gem5's 
0.07 seconds. What is relevant is the slowdown factor -- 1 for MARSS, between 
1200 and 2700 for gem5.

**************************************

All of these timings were carried out in Ubuntu 12.04 64bit running in a single 
CPU VirtualBox, hosted by Ubuntu 12.04 64bit with Intel Core i7 930 @ 2.80 Ghz. 
The host system has 6GB of RAM, and the VirtualBox has 2GB.

"bench.c" is a program to load the CPU and file I/O

/* benchmark
 * 20121018
 * Ben Payne
 * load CPU and file I/O
 */

#include <stdio.h>
#include <time.h>
main()
{
  int number_of_computes;
  int number_of_read_writes;
  int number_of_iterations;
  int iteration_indx;
  int read_write_indx;
  int compute_indx;
  int valu;
  int temp_read;
  clock_t time_start, time_end;
  double cpuTime;
  FILE *outfile;
  FILE *infile;
  time_start = clock();
  number_of_computes=500;
  number_of_read_writes=100;
  number_of_iterations=100;

  for (iteration_indx = 1; iteration_indx <= number_of_iterations ; 
iteration_indx++)
  {
    for (read_write_indx = 1; read_write_indx <= number_of_read_writes ; 
read_write_indx++)
    {
      outfile = fopen("out.dat","a+"); /* apend file (add text to a file or 
create a file if it does not exist.*/
      fprintf(outfile,"%u\n",read_write_indx); /*writes*/
      fclose(outfile);
      for (compute_indx = 1; compute_indx <= number_of_computes ; 
compute_indx++)
      {
        valu=(compute_indx+1)*23;
      }
      infile = fopen("out.dat","r");
      fscanf(infile,"%d",&temp_read);
      fclose(infile);
    }
  }
  time_end = clock();
  cpuTime= ((double)(time_end-time_start))/ (CLOCKS_PER_SEC);
  printf("CPU time= %f seconds\n",cpuTime);
  return 0;
}




From: gem5-users-boun...@gem5.org<mailto:gem5-users-boun...@gem5.org> 
[mailto:gem5-users-boun...@gem5.org<mailto:gem5-users-boun...@gem5.org>] On 
Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 10:26 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
Thanks for your answer. Ruby is a module in GEM5 which simulate memory 
hierarchy. Suppose there is an application that its execution time is 20 ms on 
a real system. GEM5 simulate the application in about 15 min. Hos is MARSS86 
simulation speed?
On Tue, Oct 23, 2012 at 5:27 PM, Payne, Benjamin 
<bpa...@lps.umd.edu<mailto:bpa...@lps.umd.edu>> wrote:
Hello,

I'm not familiar with what you are referring to by the ruby module - is that an 
addon for Gem5?

You have a good question, but how would I quantify the difference in simulation 
speeds between MARSS and Gem5? Is there an established benchmark to run?

Kindly,


Ben Payne

From: gem5-users-boun...@gem5.org<mailto:gem5-users-boun...@gem5.org> 
[mailto:gem5-users-boun...@gem5.org<mailto:gem5-users-boun...@gem5.org>] On 
Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 9:31 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS

I have a question about MARSS. As you know GEM5 simulation speed with ruby 
module is very slow. May I know MARSS simulation speed?

Thanks
On Tue, Oct 23, 2012 at 2:26 AM, Andreas Hansson 
<andreas.hans...@arm.com<mailto:andreas.hans...@arm.com>> wrote:
Hi Benjamin,
The list is long.gem5 has (amongst other things):

a variety of CPU models that are orthogonal to the ISA, atomic for speed, in 
order and O3 for details uarch models

BSD license (thus both academia and companies involved and contributing)

full-system ready-to-run Android disk images and configurations, not just your 
average chip-multi-processor, but also heterogeneous application-processor-like 
systems with state-of-the-art CPU models

a very active (and large) user community


Ultimately using one or the other really depends on what problem it is you want 
to address.

Andreas

From: <Payne>, Benjamin 
<bpa...@lps.umd.edu<mailto:bpa...@lps.umd.edu><mailto:bpa...@lps.umd.edu<mailto:bpa...@lps.umd.edu>>>
Reply-To: gem5 users mailing list 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org><mailto:gem5-users@gem5.org<mailto:gem5-users@gem5.org>>>
Date: Monday, 22 October 2012 22:06
To: 
"gem5-users@gem5.org<mailto:gem5-users@gem5.org><mailto:gem5-users@gem5.org<mailto:gem5-users@gem5.org>>"
 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org><mailto:gem5-users@gem5.org<mailto:gem5-users@gem5.org>>>
Subject: [gem5-users] gem5 versus MARSS

Hello,

What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home

As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set architectures, 
whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has 
not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2

Please correct me if any of these statements are incorrect.

Are there any other considerations?

Thank you,


Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890<tel:443-654-7890>
cell: 608-308-2413<tel:608-308-2413>
-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org<mailto:gem5-users@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com


_______________________________________________
gem5-users mailing list
gem5-users@gem5.org<mailto:gem5-users@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com


_______________________________________________
gem5-users mailing list
gem5-users@gem5.org<mailto:gem5-users@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


_______________________________________________
gem5-users mailing list
gem5-users@gem5.org<mailto:gem5-users@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


_______________________________________________
gem5-users mailing list
gem5-users@gem5.org<mailto:gem5-users@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to