Re: [gem5-users] A Patch for DRAMsim2 Integration

Ali Saidi Sun, 06 May 2012 09:01:54 -0700

Hi Andrew,

Could you add some code to the table walker to see how big the following are 
getting:
stateQueueL1.size()
stateQueueL2.size()
pendingQueue.size()


Perhaps we're some how getting into a loop where there are a lot of 
translations to invalid addresses that get squashed and they pile up in the 
table walker? 

Thanks,
Ali



On May 4, 2012, at 7:53 AM, Gabriel Michael Black wrote:

> I haven't had a chance to study what's going on here, but could the problem 
> be that we don't have bandwidth limits/back pressure implemented for the TLB 
> and delayed translation? It could be that the CPU is pumping instructions 
> into translation which eventually drain out/are squashed, and if too many 
> accumulate they trip that assert.
> 
> That may not actually make any sense as far as what the code is actually 
> doing, but it occurred to me as a possibility and I thought I'd throw it out 
> there.
> 
> Gabe
> 
> Quoting Andrew Cebulski <af...@drexel.edu>:
> 
>> I double-checked by looking at the config.ini file.  It turns out I did
>> actually create the checkpoint with an Atomic CPU without caches.  Sorry
>> for the confusion.
>> 
>> -Andrew
>> 
>> On Wed, May 2, 2012 at 10:12 PM, Andrew Cebulski <af...@drexel.edu> wrote:
>> 
>>> I started hitting this assertion (that the number of insts in flight was >
>>> 1500) before I started using a checkpoint.  I created the checkpoint
>>> afterwards to decrease the time needed to run simulations to debug this
>>> problem.  I'll create a new checkpoint, then send the new trace output.
>>> 
>>> -Andrew
>>> 
>>> 
>>> On Wed, May 2, 2012 at 9:53 PM, Ali Saidi <sa...@umich.edu> wrote:
>>> 
>>>> **
>>>> 
>>>> It's likely the cause for all of your problems. Dirty data in the caches
>>>> doesn't get restored either.  You should always create checkpoints with an
>>>> atomic cpu and without caches.
>>>> 
>>>> 
>>>> 
>>>> Ali
>>>> 
>>>> 
>>>> 
>>>> On 02.05.2012 21:23, Andrew Cebulski wrote:
>>>> 
>>>> Sorry, I created the checkpoint I referred to with an O3 CPU with caches.
>>>> From what I recall reading, caches don't get restored from checkpoints.
>>>> Since the checkpoint wasn't during the benchmark run, I assumed that was
>>>> okay.
>>>> -Andrew
>>>> 
>>>> On Wed, May 2, 2012 at 9:07 PM, Ali Saidi <sa...@umich.edu> wrote:
>>>> 
>>>>> You haven't answered the question about if you created the checkpoints
>>>>> with an atomic cpu without caches.
>>>>> 
>>>>> Ali
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 02.05.2012 19:58, Andrew Cebulski wrote:
>>>>> 
>>>>> I have not run with the checker CPU recently.  Here's the stderr output
>>>>> from a run I did awhile back:
>>>>> http://dl.dropbox.com/u/2953302/gem5/err.0
>>>>> Note that the instruction match error is before my benchmark actually
>>>>> starts running.  The start of my boot script checks to see if my files
>>>>> image is mounted (which it is), then continues on to run the benchmark.  I
>>>>> booted the system, mounted my files image, then took a checkpoint.  I've
>>>>> been running all my tests from that checkpoint.  I found where my 
>>>>> benchmark
>>>>> started based on the ASID (from ExecAsid debug flag).
>>>>> I delayed the start of gathering trace data until the second-to-last
>>>>> linear increase in dynamic instructions in-flight.  I'm running a new 
>>>>> trace
>>>>> now.
>>>>> -Andrew
>>>>> 
>>>>> 
>>>>> On Wed, May 2, 2012 at 5:28 PM, Ali Saidi <sa...@umich.edu> wrote:
>>>>> 
>>>>>> Something is wrong well before this point. There is no reason that
>>>>>> address 0x0 or 0x4 should be translated.
>>>>>> 
>>>>>> Did you happen to create a checkpoint when caches were in the system?
>>>>>> 
>>>>>> Have you tried to run with the checker cpu and see if it detects any
>>>>>> errors?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Ali
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 02.05.2012 17:22, Andrew Cebulski wrote:
>>>>>> 
>>>>>> They are data TLB misses that occur as the in-flight instruction count
>>>>>> rises (at 0x0 and 0x4).  The last TLB miss before the in-flight 
>>>>>> instruction
>>>>>> count finally linearly decreases is to 0x200.  Also, at the start of the
>>>>>> rising slope, I see a miss to 0x8 and 0x2508c.
>>>>>> Here's a trace file:
>>>>>> http://dl.dropbox.com/u/2953302/gem5/tlb.out
>>>>>> To reduce size, I just have lines that have either TLB or walker in
>>>>>> them.
>>>>>> I do see only a handful of instruction TLB misses.
>>>>>> -Andrew
>>>>>> 
>>>>>> On Wed, May 2, 2012 at 11:10 AM, Ali Saidi <sa...@umich.edu> wrote:
>>>>>> 
>>>>>>> Hi Andrew,
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks for digging into this. I think there is an issue somewhere, but
>>>>>>> I'm still not sure where.
>>>>>>> 
>>>>>>> Ali
>>>>>>> 
>>>>>>> On 01.05.2012 23:34, Andrew Cebulski wrote:
>>>>>>> 
>>>>>>> Okay, I'm positive now that the issue lies with delayed translations
>>>>>>> that are squashed before finishing.
>>>>>>> 
>>>>>>> On the data on instruction side? You seem to allude to data in the
>>>>>>> paragraph below, but then instructions in the latter text.
>>>>>>> 
>>>>>>> It seems to me like speculative load/stores are being executed,
>>>>>>> rather than waiting for the instructions to commit.  Once the 
>>>>>>> instructions
>>>>>>> begin getting (speculatively) executed in the TLB, a reference is left
>>>>>>> there, which seems hard to root out and dereference after the 
>>>>>>> instruction
>>>>>>> ends up being squashed.  At least, I have not been able to find that 
>>>>>>> out in
>>>>>>> the source code as of yet.  Can anyone clarify on this?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> There should only be one translation outstanding from each
>>>>>>> instruction and data side walker. Any nested transactions should be 
>>>>>>> queued
>>>>>>> in the walker. Until one finishes, I'm not sure how multiple would ever 
>>>>>>> be
>>>>>>> outstanding.
>>>>>>> 
>>>>>>> Recall the following image that shows how the number of dynamic
>>>>>>> instruction (DynInst) objects in-flight increases linearly for varying
>>>>>>> periods of time:
>>>>>>> http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png
>>>>>>> After enabling the TLB debug flag, I see that the linear increase in
>>>>>>> instructions in flight is proportional to the number of TLB misses.  
>>>>>>> These
>>>>>>> TLB misses have a much larger delay (resulting in translation delays) 
>>>>>>> due
>>>>>>> to the fact the DramSim2 models the memory system more accurately.  It
>>>>>>> seems that with the classic memory system, TLB misses often do not have
>>>>>>> translation delays.  For whatever reason, it would also seem that every
>>>>>>> instruction that has a TLB miss also is eventually squashed...
>>>>>>> 
>>>>>>> From a data side perspective this is reasonable. While a miss is
>>>>>>> outstanding at some point instructions will stop committing and thus the
>>>>>>> instructions in flight will begin to rise until the miss is satisfied.
>>>>>>> 
>>>>>>> Here's a summary of outputs from my trace.  These two DPRINTF
>>>>>>> messages appears on the rising slopes (repeated up until the peak):
>>>>>>> TLB Miss: Starting hardware table walker for 0(656)
>>>>>>> TLB Miss: Starting hardware table walker for 0x4(656)
>>>>>>> 
>>>>>>> This is interesting/odd. I don't know a good reason why (1) a miss
>>>>>>> would be outstanding to both address 0 and address 4 at the same time. 
>>>>>>> In
>>>>>>> almost all cases these pages are marked as no-access to detect 
>>>>>>> segfaults.
>>>>>>> Perhaps there is an issue where the cpu is getting into a loop faulting 
>>>>>>> on
>>>>>>> a bad access and then faulting again on the fault handler. I could 
>>>>>>> imagine
>>>>>>> this would happen if there was some corruption in the memory system (for
>>>>>>> example the timings in dramsim exposing a bug in the cache models or
>>>>>>> something).
>>>>>>> 
>>>>>>> 
>>>>>>> At the peak, the following message appears (from fetch) almost every
>>>>>>> tick for (what I believe to be) every single one of the table walkers 
>>>>>>> that
>>>>>>> were squashed.
>>>>>>> Fetch is waiting ITLB walk to finish!
>>>>>>> 
>>>>>>> There must be another walk in flight? The instruction side will only
>>>>>>> have one fault outstanding at once. Successive branch mispredicts will
>>>>>>> re-direct fetch but there is code that catches the fact that a different
>>>>>>> walk completed then expected and "does the right thing."
>>>>>>> 
>>>>>>> The problem is that these ITLB table walks are for instructions that
>>>>>>> were squashed as much as 0.3 billion cycles earlier, and since been 
>>>>>>> removed
>>>>>>> from the CPU's instruction list.
>>>>>>> 
>>>>>>> I'm not following here.
>>>>>>> 
>>>>>>> Any help will be greatly appreciated in solving this problem.  I've
>>>>>>> hit a roadblock with getting Ruby working with ARM, most likely due to 
>>>>>>> the
>>>>>>> fact that ARM has disjoint memory (x86 and Alpha do not).  There's the 
>>>>>>> 256
>>>>>>> MB for physical memory, then the 64 MB for the boot loader.  I brought 
>>>>>>> this
>>>>>>> up in my last email about trying to get Ruby working.  Therefore, I'm
>>>>>>> trying to get this DramSim2 integration fixed so I can start modeling FS
>>>>>>> with DRAM memory.
>>>>>>> 
>>>>>>> Brad/Steve/Nilay anyone have a suggestion on how to make this work?
>>>>>>> 
>>>>>>> 
>>>>>>> Note that these problems also occur in Soplex from the Spec CPU2006
>>>>>>> benchmark suite (also hits 1500 in-flight instructions assertion).  Due 
>>>>>>> to
>>>>>>> time constraints, I haven't tested on other benchmarks.
>>>>>>> Thanks,
>>>>>>> Andrew
>>>>>>>   On Tue, May 1, 2012 at 4:27 AM, Andrew Cebulski 
>>>>>>> <af...@drexel.edu>wrote:
>>>>>>> 
>>>>>>>> Hey Gabe,
>>>>>>>>    Thanks for this...very helpful.  I just recently got back into
>>>>>>>> debugging this problem.  I made a small change in src/base/refcnt.hh to
>>>>>>>> allow me to return the current count of references to a DynInst object.
>>>>>>>>    I then modified existing DPRINTFs to also print out reference
>>>>>>>> counts, then added some of my own when I needed extra visibility.
>>>>>>>>    I've found one memory store instruction that seems to be getting
>>>>>>>> lost.  What's happening is that is progresses as far as getting 
>>>>>>>> executed in
>>>>>>>> the IEW once, but a delayed translation occurs, deferring the store.  
>>>>>>>> By
>>>>>>>> the time it reenters the IEW, the IQ has marked the instruction as
>>>>>>>> squashed.  Everything progresses as usual from here on out, with one
>>>>>>>> exception.  When the instruction is removed from the CPUs instruction 
>>>>>>>> list,
>>>>>>>> there is one reference count hanging.
>>>>>>>>    I've added in some additional debugging for my traces to help
>>>>>>>> narrow down where this reference is coming from.  As far as I can tell,
>>>>>>>> it's because of a call to initiateAcc() within the executeStore 
>>>>>>>> function in
>>>>>>>> the lsq unit.  Please see the following two traces.  The first trace 
>>>>>>>> shows
>>>>>>>> what I just discussed.  The second trace is another memory store
>>>>>>>> instruction that got squashed, however, it was squashed upon its first
>>>>>>>> entry into the IEW, therefore it never started execution.
>>>>>>>> http://dl.dropbox.com/u/2953302/gem5/lostinstruction.out
>>>>>>>> http://dl.dropbox.com/u/2953302/gem5/similarinstruction.out
>>>>>>>>    Let me know if you have any ideas based on these two instruction
>>>>>>>> traces.  I do not understand how the initiateAcc function results in
>>>>>>>> another reference, but maybe someone else does....  Since I don't see 
>>>>>>>> how
>>>>>>>> it makes a reference, it's hard to find out how to make sure it gets
>>>>>>>> dereferenced...
>>>>>>>>    Unfortunately, I haven't been able to add a DPRINTF in
>>>>>>>> src/base/refcnt.hh ...this would make things more clear (i.e. exactly 
>>>>>>>> when
>>>>>>>> references/deferences occur).  Let me know if you have any advice on
>>>>>>>> this...if it's possible.  I can't seem to get the right include files, 
>>>>>>>> and
>>>>>>>> likely right SConscript compile order...
>>>>>>>> Thanks,
>>>>>>>> Andrew
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Apr 7, 2012 at 9:48 PM, Gabe Black 
>>>>>>>> <gbl...@eecs.umich.edu>wrote:
>>>>>>>> 
>>>>>>>>> Without digging into things too deeply, it looks like you may be
>>>>>>>>> leaking references to dynamic instructions. The CPU may think it's 
>>>>>>>>> done
>>>>>>>>> with one, but until that final reference is removed, the object will 
>>>>>>>>> hang
>>>>>>>>> around forever. I think I've had problems before where there reference
>>>>>>>>> count ended up off by one somehow and instructions would start piling 
>>>>>>>>> up.
>>>>>>>>> It's also possible that a clog develops in O3's pipeline and some 
>>>>>>>>> internal
>>>>>>>>> structure stops letting instructions through and starts accumulating 
>>>>>>>>> them.
>>>>>>>>> Either of these problems will be annoying to track down, but with 
>>>>>>>>> enough
>>>>>>>>> digging I've been able to fix these sorts of things.
>>>>>>>>> 
>>>>>>>>> This may have more to do with O3 not handling the benchmark you're
>>>>>>>>> running well rather than a problem with your new DRAM model. There 
>>>>>>>>> may be
>>>>>>>>> some interaction between the two, though, where the new memory makes 
>>>>>>>>> the
>>>>>>>>> timing line up to cause O3 to behave poorly. What you can do is 
>>>>>>>>> instrument
>>>>>>>>> dynamic instruction creation and destruction and reference counting 
>>>>>>>>> (try
>>>>>>>>> print "this" for both the reference counting wrapper and the dyn inst
>>>>>>>>> itself) and turn it on as close as you can to where things go bad tick
>>>>>>>>> wise. Then look for an instruction which gets lost, and look for 
>>>>>>>>> where it's
>>>>>>>>> reference count is incremented and decremented. It should be 
>>>>>>>>> relatively
>>>>>>>>> easy to pair up where references are created and destroyed, and you 
>>>>>>>>> should
>>>>>>>>> be able to identify the reference which never goes away. Then you 
>>>>>>>>> need to
>>>>>>>>> figure out where that reference is being created. After that, you 
>>>>>>>>> should
>>>>>>>>> have enough information to identify why the reference counting isn't 
>>>>>>>>> being
>>>>>>>>> done correctly. It's arduous, but that's the only way.
>>>>>>>>> 
>>>>>>>>> It's important to also make sure reference counts aren't decremented
>>>>>>>>> to zero prematurely. I had a problem once where that happened and the
>>>>>>>>> memory behind the object was updated by something that didn't know it 
>>>>>>>>> was
>>>>>>>>> dead. The memory had since been reallocated to another object of the 
>>>>>>>>> same
>>>>>>>>> type, so that other object reflected what happened to the phantom 
>>>>>>>>> one. If I
>>>>>>>>> remember that manifested as something weird like an add causing a page
>>>>>>>>> fault or something.
>>>>>>>>> 
>>>>>>>>> Gabe
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 04/07/12 18:21, Andrew Cebulski wrote:
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> I've looked into this problem some more, and have put together a
>>>>>>>>> couple traces.  I've been becoming more familiar with how gem5 handles
>>>>>>>>> dynamic instructions, in particular how it destroys them.  I have two
>>>>>>>>> traces to compare, one with the physical memory, and the other with 
>>>>>>>>> the
>>>>>>>>> integrated dramsim2 dram memory.  I also have two plots showing 
>>>>>>>>> instruction
>>>>>>>>> counts over time (sim ticks).  All of these are linked at the end of 
>>>>>>>>> the
>>>>>>>>> email.
>>>>>>>>> First, I'm going to go into what I've been able to interpret
>>>>>>>>> regarding how instructions are destroyed.  In particular, comparing 
>>>>>>>>> when
>>>>>>>>> DynInst's vs. DynInstPtr's are deconstructed/removed from the cpu.  I
>>>>>>>>> separate these because I've seen a difference, as I discuss later.  
>>>>>>>>> These
>>>>>>>>> explanations are fairly non-existent on the wiki.  There is a section
>>>>>>>>> header waiting to be filled...
>>>>>>>>> From what I have been able to gather from the code, there is a list
>>>>>>>>> of all the instructions in flight in cpu/o3/cpu.cc called instList, 
>>>>>>>>> with
>>>>>>>>> the type DynInstPtr.  There are three conditions to instructions being
>>>>>>>>> cleaned from this list:
>>>>>>>>> 1.)  The ROB retires its head instruction
>>>>>>>>> 2.)  Fetch receives a rob squashing signal from the commit,
>>>>>>>>> resulting in removing any instruction not in the ROB
>>>>>>>>> 3.)  Decode detects an incorrect branch prediction, resulting in
>>>>>>>>> removal of all instructions back to the bad seq num.
>>>>>>>>> Once all five stages have completed, the CPU cleans up all the
>>>>>>>>> removed in-flight instructions.  This line in particular
>>>>>>>>> in cleanUpRemovedInsts() in cpu/o3/cpu.cc deconstructs a DynInstPtr:
>>>>>>>>> instList.erase(removeList.front());
>>>>>>>>> When I turn on the debug flag O3CPU, I see the message "Removing
>>>>>>>>> instruction, ..." (from o3/cpu.cc) with the threadNum, seqNum and 
>>>>>>>>> pcState
>>>>>>>>> after all 5 cpu stages have completed, and one of the conditions 
>>>>>>>>> above is
>>>>>>>>> met.  I also see what tick it occurs on.
>>>>>>>>> When I turn on the DynInst debug flag, I see when instructions are
>>>>>>>>> created and destroyed (cpu/base_dyn_inst_impl.hh) and what tick.  From
>>>>>>>>> analyzing the trace files, I've gathered that this takes into account 
>>>>>>>>> that
>>>>>>>>> instructions have different execution lengths.  So if one tick a 
>>>>>>>>> memory
>>>>>>>>> instruction in the instList (DynInstPtr) is removed, the DynInst for 
>>>>>>>>> that
>>>>>>>>> memory instruction will occur much later (i.e. 1M ticks later).  I 
>>>>>>>>> have yet
>>>>>>>>> to determine how this is implemented.
>>>>>>>>> Now for the problem.
>>>>>>>>> What I'm seeing when I run dramsim2 dram memory is a significant
>>>>>>>>> difference between the size of the instList vector (of DynInstPtr 
>>>>>>>>> objects),
>>>>>>>>> and the size of dynamic instruction count (of DynInst objects).  The
>>>>>>>>> benchmark I'm running is libquantum from SPEC 2006.  For the first 
>>>>>>>>> roughly
>>>>>>>>> 130B ticks, the dynamic instruction count kept in 
>>>>>>>>> cpu/base_dyn_inst.impl.hh
>>>>>>>>> shadows the instList size in o3/cpu.cc (figure linked below) very 
>>>>>>>>> closely.
>>>>>>>>> Around tick 130B after libquantum started, it starts hitting what I'm
>>>>>>>>> assuming are loops (therefore branch prediction), resulting in some
>>>>>>>>> behavior that seems to imply improper instruction handling (i.e. more
>>>>>>>>> instructions in flight than allowed by ROB).
>>>>>>>>> I wasn't able to sync-up the physical and dramsim2 traces exactly by
>>>>>>>>> trace, but they should represent roughly the same area of execution.  
>>>>>>>>> They
>>>>>>>>> don't execute the same due to the dramsim2 modeling the memory 
>>>>>>>>> differently
>>>>>>>>> (i.e. latency and other delays).
>>>>>>>>> I've shared both traces on my public Dropbox here --
>>>>>>>>> 
>>>>>>>>> http://dl.dropbox.com/u/2953302/gem5/physical-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU.out.gz
>>>>>>>>> 
>>>>>>>>> http://dl.dropbox.com/u/2953302/gem5/dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz
>>>>>>>>> Here are a couple plots of tick versus instruction count, with
>>>>>>>>> respect to cpu->instcount in cpu/base_dyn_inst.impl.hh and 
>>>>>>>>> instList.size()
>>>>>>>>> in cpu/o3/cpu.cc.  --
>>>>>>>>> 
>>>>>>>>> http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_physical.png
>>>>>>>>> 
>>>>>>>>> http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png
>>>>>>>>> Note that I added the printout of the instList size to an existing
>>>>>>>>> O3CPU DPRINTF in cleanUpRemovedInsts() in cpu/o3/cpu.cc.
>>>>>>>>> Here are the commands I ran to parse the traces into data files to
>>>>>>>>> analyze in MATLAB and create the plots:
>>>>>>>>> zgrep DynInst
>>>>>>>>> dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz | grep 
>>>>>>>>> destroyed
>>>>>>>>> | awk '{print $1,$11}' > cpuinstcount.out
>>>>>>>>> zgrep instList
>>>>>>>>> dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz | awk 
>>>>>>>>> '{print
>>>>>>>>> $1,$11}' > instlistsize.out
>>>>>>>>> It seems to me like the problem might lie in gem5, but has just been
>>>>>>>>> exposed by integrating this more detailed memory model, dramsim2, into
>>>>>>>>> gem5.  Either that, or their are some timing errors in how dramsim2 
>>>>>>>>> was
>>>>>>>>> integrated.  I doubt this, however, since those first 190B ticks 
>>>>>>>>> executed
>>>>>>>>> used the dramsim2 memory.  I believe the problem is a combination of 
>>>>>>>>> memory
>>>>>>>>> instructions + complex loops (branch prediction), resulting in 
>>>>>>>>> improper
>>>>>>>>> destroying of instructions.
>>>>>>>>> I've included the ROB, Commit, Fetch, DynInst and O3CPU debug flags.
>>>>>>>>> Their are 192 ROB entries, which is why the instList size generally 
>>>>>>>>> has a
>>>>>>>>> max of about 192 instructions.  The dynamic instruction counts (seen 
>>>>>>>>> in the
>>>>>>>>> dramsim2 plot) seem to also imply that instructions are incorrectly 
>>>>>>>>> been
>>>>>>>>> removed from the ROB, and then from the cpu's instruction list in 
>>>>>>>>> cpu.cc,
>>>>>>>>> which allows more and more instructions to be added to the system 
>>>>>>>>> (possibly
>>>>>>>>> from a bad branch).
>>>>>>>>> I appreciate any help in debugging this and further figuring out the
>>>>>>>>> root problem, just let me know if you need anything else from me.  I 
>>>>>>>>> don't
>>>>>>>>> have much more time at the moment to debug, but I can take any advice 
>>>>>>>>> for
>>>>>>>>> quick changes and/or additional traces, then send the results back to 
>>>>>>>>> the
>>>>>>>>> list for discussion.
>>>>>>>>> Thanks,
>>>>>>>>> Andrew
>>>>>>>>> P.S. Paul - I did try decreasing the size of the dramsim2
>>>>>>>>> transaction (and even command) queue from 512 to 32.  The same 
>>>>>>>>> instructions
>>>>>>>>> problem occurred.  It basically just decreased the execution time.
>>>>>>>>> 
>>>>>>>>> On Wed, Mar 14, 2012 at 2:10 PM, Ali Saidi <sa...@umich.edu> wrote:
>>>>>>>>> 
>>>>>>>>>> The error is that there are more that 1500 instructions currently
>>>>>>>>>> in flight in the system. It could mean several things:
>>>>>>>>>> 
>>>>>>>>>> 1. The value is somewhat arbitrarily defined and maybe there are
>>>>>>>>>> more than 1500 in your system at one time?
>>>>>>>>>> 
>>>>>>>>>> 2. Instructions aren't being destroyed correctly
>>>>>>>>>> 
>>>>>>>>>> You could try to to run a debug binary so you'll get a list of
>>>>>>>>>> instructions when it happens or increase the number which may
>>>>>>>>>> be appropriate for certain situations (but 1500 is quite a few 
>>>>>>>>>> inflight
>>>>>>>>>> instructions).
>>>>>>>>>> 
>>>>>>>>>> Ali
>>>>>>>>>> 
>>>>>>>>>> On 13.03.2012 10:56, Andrew Cebulski wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Xiangyu,
>>>>>>>>>>    I just started looking into this some more.  So at first I
>>>>>>>>>> thought it was due to updating to a more recent revision, but then I 
>>>>>>>>>> went
>>>>>>>>>> back to revision 8643, added your patch, built and ran....and now 
>>>>>>>>>> get the
>>>>>>>>>> error with it too (when running ARM_FS/gem5.opt).  I"m testing now 
>>>>>>>>>> to see
>>>>>>>>>> if an update to SWIG might have resulted in this error, maybe 
>>>>>>>>>> someone on
>>>>>>>>>> the mailing list would know if that's possible.  The difference is 
>>>>>>>>>> 1.3.40
>>>>>>>>>> vs. 2.0.3, both of which are supported according to the dependencies 
>>>>>>>>>> wiki
>>>>>>>>>> page.
>>>>>>>>>> Just for completeness, here's the error from revision 8643:
>>>>>>>>>> build/ARM_FS/cpu/base_dyn_inst_impl.hh:149: void
>>>>>>>>>> BaseDynInst::initVars() [with Impl = O3CPUImpl]: Assertion 
>>>>>>>>>> `cpu->instcount
>>>>>>>>>>   I have not tried running with gem5.debug, so I will be doing
>>>>>>>>>> that today.  Maybe this is an assertion that is occurring due to an
>>>>>>>>>> optimization.  That would mean it wouldn't be triggered in 
>>>>>>>>>> gem5.debug since
>>>>>>>>>> it runs without optimizations.  Have you tested all debug, opt and 
>>>>>>>>>> fast
>>>>>>>>>> with your tests?
>>>>>>>>>> Thanks,
>>>>>>>>>> Andrew
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 13, 2012 at 1:37 PM, Rio Xiangyu Dong <
>>>>>>>>>> riosher...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>>  Hi Andrew,
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I didn?t see this error in my simulations. May I ask which gem5
>>>>>>>>>>> version you are using? I find some of the latest code updates do 
>>>>>>>>>>> not comply
>>>>>>>>>>> with my changes. I am still using the DRAMsim2 patch on Gem5 
>>>>>>>>>>> repo8643, and
>>>>>>>>>>> have run all the runnable benchmarks in SPEC2006, SPEC2000, EEMBC2, 
>>>>>>>>>>> and
>>>>>>>>>>> PARSEC2 on ARM_SE.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thank you!
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> 
>>>>>>>>>>> Xiangyu
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *From:* Andrew Cebulski [mailto:af...@drexel.edu]
>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 6:52 PM
>>>>>>>>>>> 
>>>>>>>>>>> *To:* gem5 users mailing list
>>>>>>>>>>> *Cc:*riosher...@gmail.com; sa...@umich.edu
>>>>>>>>>>> 
>>>>>>>>>>> *Subject:* Re: [gem5-users] A Patch for DRAMsim2 Integration
>>>>>>>>>>> 
>>>>>>>>>>> Xiangyu,
>>>>>>>>>>> 
>>>>>>>>>>>   I've been having an issue recently with the number of
>>>>>>>>>>> instructions I've been seeing committed to the CPU (I have a 
>>>>>>>>>>> separate
>>>>>>>>>>> thread on this).  It turns out the issue seems to be coming from 
>>>>>>>>>>> this patch
>>>>>>>>>>> you created to integrate DramSim2 with Gem5.  Unfortunately, I've 
>>>>>>>>>>> been
>>>>>>>>>>> running with gem5.fast, not gem5.opt.  So up until now, I haven't 
>>>>>>>>>>> been
>>>>>>>>>>> seeing assertions.  I thought I'd run it with gem5.opt or debug 
>>>>>>>>>>> back in
>>>>>>>>>>> December, but I must not have.  My runs on the Arm O3 cpu fails 
>>>>>>>>>>> with this
>>>>>>>>>>> assertion:
>>>>>>>>>>> 
>>>>>>>>>>> build/ARM/cpu/base_dyn_inst_impl.hh:149: void
>>>>>>>>>>> BaseDynInst::initVars() [with Impl = O3CPUImpl]: Assertion 
>>>>>>>>>>> `cpu->instcount
>>>>>>>>>>> 
>>>>>>>>>>> -Andrew
>>>>>>>>>>> 
>>>>>>>>>>> Date: Sun, 18 Dec 2011 01:48:58 -0800
>>>>>>>>>>> From: "Dong, Xiangyu" <riosher...@gmail.com>
>>>>>>>>>>> To: "gem5 users mailing list" <gem5-users@gem5.org>
>>>>>>>>>>> Subject: [gem5-users] A Patch for DRAMsim2 Integration
>>>>>>>>>>> Message-ID: gmail.com>
>>>>>>>>>>> 
>>>>>>>>>>> Content-Type: text/plain; charset="us-ascii"
>>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I have a Gem5+DRAMsim2 patch.  I've tested it under both SE and FS
>>>>>>>>>>> modes.
>>>>>>>>>>> I'm willing to share it here.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> For those who have such needs, please go to my website
>>>>>>>>>>> www.cse.psu.edu/~xydong <http://www.cse.psu.edu/%7Exydong> to
>>>>>>>>>>> download the patch and test it.  To enable
>>>>>>>>>>> DRAMSim2, use se_dramsim2.py script instead of se.py (for FS, you
>>>>>>>>>>> can create
>>>>>>>>>>> by yourself).  The basic idea to enable the DRAMsim2 module is to
>>>>>>>>>>> use the
>>>>>>>>>>> derived DRAMMemory class instead of PhysicalMemory class.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Please let me know if there are bugs.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thank you!
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> 
>>>>>>>>>>> Xiangyu Dong
>>>>>>>>>>> 
>>>>>>>>>>> -------------- next part --------------
>>>>>>>>>>> An HTML attachment was scrubbed...
>>>>>>>>>>> URL: <
>>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20111218/f3fdf5da/attachment.html
>>>>>>>>>>> >
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gem5-users mailing list
>>>>>>>>>> gem5-users@gem5.org
>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> gem5-users mailing 
>>>>>>>>> listgem5-users@gem5.orghttp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> gem5-users mailing list
>>>>>>>>> gem5-users@gem5.org
>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> gem5-users mailing list
>>>>>>> gem5-users@gem5.org
>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> gem5-users mailing list
>>>>>> gem5-users@gem5.org
>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gem5-users mailing list
>>>>> gem5-users@gem5.org
>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> gem5-users@gem5.org
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>> 
>>> 
>>> 
>> 
> 
> 
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> 

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] A Patch for DRAMsim2 Integration

Reply via email to