Re: [gem5-users] problem with benchmarks which uses '<'

Mahmood Naderan Sat, 28 Apr 2012 08:58:59 -0700

Isn't there any idea about my finding?


On 4/27/12, Mahmood Naderan <mahmood...@gmail.com> wrote:
> ok I think I find the bug. I used "continue" and "ctrl+c" multiple
> times to see if it stuck at a particular function. The backtrace
> shows:
>
>
> #0  0x00000000004dfee7 in __gnu_cxx::hashtable<std::pair<unsigned long
> const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
> long>, std::_Select1st<std::pair<unsigned long const,
> X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
> std::allocator<X86ISA::TlbEntry> >::_M_bkt_num_key (this=0x28b8970,
>     __key=@0x21d9a7c8, __n=50331653) at
> /usr/include/c++/4.4/backward/hashtable.h:590
> #1  0x00000000004dfff9 in __gnu_cxx::hashtable<std::pair<unsigned long
> const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
> long>, std::_Select1st<std::pair<unsigned long const,
> X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
> std::allocator<X86ISA::TlbEntry> >::_M_bkt_num (this=0x28b8970,
>     __obj=..., __n=50331653) at
> /usr/include/c++/4.4/backward/hashtable.h:594
> #2  0x00000000004df9c8 in __gnu_cxx::hashtable<std::pair<unsigned long
> const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
> long>, std::_Select1st<std::pair<unsigned long const,
> X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
> std::allocator<X86ISA::TlbEntry> >::resize (this=0x28b8970,
>     __num_elements_hint=25165844) at
> /usr/include/c++/4.4/backward/hashtable.h:1001
> #3  0x00000000004df100 in __gnu_cxx::hashtable<std::pair<unsigned long
> const, X86ISA::TlbEntry>, unsigned long, __gnu_cxx::hash<unsigned
> long>, std::_Select1st<std::pair<unsigned long const,
> X86ISA::TlbEntry> >, std::equal_to<unsigned long>,
> std::allocator<X86ISA::TlbEntry> >::find_or_insert (this=0x28b8970,
>     __obj=...) at /usr/include/c++/4.4/backward/hashtable.h:789
> #4  0x00000000004deaca in __gnu_cxx::hash_map<unsigned long,
> X86ISA::TlbEntry, __gnu_cxx::hash<unsigned long>,
> std::equal_to<unsigned long>, std::allocator<X86ISA::TlbEntry>
>>::operator[] (this=0x28b8970,
>     __key=@0x7fffffffba80) at /usr/include/c++/4.4/ext/hash_map:217
> #5  0x00000000004daa68 in PageTable::map (this=0x28b8970,
> vaddr=47015569313792, paddr=103079288832,
>     size=5548434767986339840, clobber=false) at
> build/X86/mem/page_table.cc:82
> #6  0x000000000074b9c8 in Process::allocateMem (this=0x30be640,
> vaddr=46912496128000,
>     size=5548434871059525632, clobber=false) at
> build/X86/sim/process.cc:332
> #7  0x00000000007aba21 in mmapFunc<X86Linux64> (desc=0x2052fb8, num=9,
> p=0x30be640, tc=0x3331210)
>     at build/X86/sim/syscall_emul.hh:1069
> #8  0x000000000073ca11 in SyscallDesc::doSyscall (this=0x2052fb8,
> callnum=9, process=0x30be640,
>     tc=0x3331210) at build/X86/sim/syscall_emul.cc:69
> #9  0x00000000007516a0 in LiveProcess::syscall (this=0x30be640,
> callnum=9, tc=0x3331210)
>     at build/X86/sim/process.cc:590
> #10 0x0000000000c10ce3 in SimpleThread::syscall (this=0x33305d0, callnum=9)
>     at build/X86/cpu/simple_thread.hh:384
>
>
>
> As you can see there is a problem with mmapFunc<X86Linux64> syscall
> which allocate memory through Process::allocateMem
> That is my understanding....
>
>
>
> On 4/27/12, Mahmood Naderan <mahmood...@gmail.com> wrote:
>> Is this useful?
>>
>> 339051500: system.cpu + A0 T0 : 0x83d48d.4  :   CALL_NEAR_I : wrip   ,
>> t7, t1 : IntAlu :
>> 339052000: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>> 339052000: system.cpu + A0 T0 : 0x852f90    : mov       r10, rcx
>> 339052000: system.cpu + A0 T0 : 0x852f90.0  :   MOV_R_R : mov   r10,
>> r10, rcx : IntAlu :  D=0x0000000000000022
>> 339052500: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>> 339052500: system.cpu + A0 T0 : 0x852f93    : mov       eax, 0x9
>> 339052500: system.cpu + A0 T0 : 0x852f93.0  :   MOV_R_I : limm   eax,
>> 0x9 : IntAlu :  D=0x0000000000000009
>> 339053000: system.cpu.icache: ReadReq (ifetch) 452f98 hit
>> ^C
>> Program received signal SIGINT, Interrupt.
>> 0x00000000004e0f90 in
>> std::__fill_n_a<__gnu_cxx::_Hashtable_node<std::pair<unsigned long
>> const, X86ISA::TlbEntry> >**, unsigned long,
>> __gnu_cxx::_Hashtable_node<std::pair<unsigned long const,
>> X86ISA::TlbEntry> >*> (__first=0x7fff70017000, __n=4065295,
>> __value=@0x7fffffffb8d0)
>>     at /usr/include/c++/4.4/bits/stl_algobase.h:758
>> 758             *__first = __tmp;
>> (gdb) ^CQuit
>> (gdb)
>>
>>
>>
>> On 4/27/12, Steve Reinhardt <ste...@gmail.com> wrote:
>>> Perhaps you could fire off the run under gdb, and use the --debug-break
>>> flag to drop in to gdb at the tick where it seems to stop running.  If
>>> the
>>> simulation stops and memory blows up, it's almost like you're stuck in
>>> some
>>> subtle infinite loop with a memory allocation in it.  (You might have to
>>> continue just a little past there and hit ctrl-c before it dies to catch
>>> it
>>> in the middle of this loop.)
>>>
>>> On Fri, Apr 27, 2012 at 11:29 AM, Mahmood Naderan
>>> <mahmood...@gmail.com>wrote:
>>>
>>>> i searched for something similar (stoping the simulation when it reach
>>>> at a specific memory usage to prevent killing) but didn't find such
>>>> thing. Do you know?
>>>>
>>>> I also attached gdb. it doesn't show anything useful because lastly it
>>>> get killed.
>>>>
>>>> On 4/27/12, Gabe Black <gbl...@eecs.umich.edu> wrote:
>>>> > Valgrind should tell you where the leaked memory was allocated. You
>>>> > may
>>>> > have to give it a command line option for that, or stop it before it
>>>> > gets itself killed.
>>>> >
>>>> > Gabe
>>>> >
>>>> > On 04/27/12 11:10, Steve Reinhardt wrote:
>>>> >> Can you attach gdb when it does this, see where it's at, and maybe
>>>> >> step through the code a bit to see what it's doing?
>>>> >>
>>>> >> On Fri, Apr 27, 2012 at 10:54 AM, Mahmood Naderan
>>>> >> <mahmood...@gmail.com <mailto:mahmood...@gmail.com>> wrote:
>>>> >>
>>>> >>     That was a guess. As I said, i turned on the debugger to see
>>>> >> when
>>>> >> it
>>>> >>     start eating the memory. As you can see the last messageit print
>>>> >> is:
>>>> >>     339069000: system.cpu + A0 T0 : 0x852f93.0  :   MOV_R_I : limm
>>>> eax,
>>>> >>     0x9 : IntAlu :  D=0x0000000000000009
>>>> >>     339069500: system.cpu.icache: set be: moving blk 452f80 to MRU
>>>> >>     339069500: system.cpu.icache: ReadReq (ifetch) 452f98 hit
>>>> >>
>>>> >>     Then no message is printed and I see, with top command, that the
>>>> >>     memory usage gos up and up until it consumes all memory.
>>>> >>
>>>> >>
>>>> >>     On 4/27/12, Nilay Vaish <ni...@cs.wisc.edu
>>>> >>     <mailto:ni...@cs.wisc.edu>> wrote:
>>>> >>     > How do you know the instruction at which the memory starts
>>>> >>     leaking? What
>>>> >>     > should we conclude from the instruction trace in your mail. I
>>>> >> am
>>>> >>     unable to
>>>> >>     > arrive at any conclusion from the valgrind report that you had
>>>> >>     attached.
>>>> >>     > Apart from the info on uninitialized values, I did not find
>>>> >> any
>>>> >>     useful
>>>> >>     > output produced by valgrind.
>>>> >>     >
>>>> >>     > --
>>>> >>     > Nilay
>>>> >>     >
>>>> >>     > On Fri, 27 Apr 2012, Mahmood Naderan wrote:
>>>> >>     >
>>>> >>     >> tonto with the test input uses about 4 GB and runs for about
>>>> >> 2
>>>> >>     seconds
>>>> >>     >> on a real machine.
>>>> >>     >>
>>>> >>     >> I also used the test input with gem5. However again after
>>>> >> tick
>>>> >>     >> 300000000, all the 30GB memory is used and then gem5 is
>>>> >> killed.
>>>> >> The
>>>> >>     >> same behaviour with ref input...
>>>> >>     >>
>>>> >>     >> I ran the following command:
>>>> >>     >> valgrind --tool=memcheck --leak-check=full
>>>> >> --track-origins=yes
>>>> >>     >> --suppressions=../util/valgrind-suppressions
>>>> ../build/X86/m5.debug
>>>> >>     >> --debug-flags=Cache,ExecAll,Bus,CacheRepl,Context
>>>> >>     >> --trace-start=339050000 ../configs/example/se.py -c
>>>> >>     >> tonto_base.amd64-m64-gcc44-nn --cpu-type=detailed -F 5000000
>>>> >>     --maxtick
>>>> >>     >> 10000000 --caches --l2cache --prog-interval=100000
>>>> >>     >>
>>>> >>     >>
>>>> >>     >> I also attach the report again. At the instruction that the
>>>> memory
>>>> >>     >> leak begins, you can see:
>>>> >>     >> ...
>>>> >>     >> 339066000: system.cpu + A0 T0 : 0x83d48d    : call   0x15afe
>>>> >>     >> 339066000: system.cpu + A0 T0 : 0x83d48d.0  :   CALL_NEAR_I :
>>>> limm
>>>> >>     >> t1, 0x15afe : IntAlu :  D=0x0000000000015afe
>>>> >>     >> 339066500: system.cpu + A0 T0 : 0x83d48d.1  :   CALL_NEAR_I :
>>>> rdip
>>>> >>     >> t7, %ctrl153,  : IntAlu :  D=0x000000000083d492
>>>> >>     >> 339067000: system.cpu.dcache: set 9a: moving blk 5aa680 to
>>>> >> MRU
>>>> >>     >> 339067000: system.cpu.dcache: WriteReq 5aa6b8 hit
>>>> >>     >> 339067000: system.cpu + A0 T0 : 0x83d48d.2  :   CALL_NEAR_I :
>>>> >>     st   t7,
>>>> >>     >> SS:[rsp + 0xfffffffffffffff8] : MemWrite :
>>>> >> D=0x000000000083d492
>>>> >>     >> A=0x7fffffffe6b8
>>>> >>     >> 339067500: system.cpu + A0 T0 : 0x83d48d.3  :   CALL_NEAR_I :
>>>> subi
>>>> >>     >> rsp, rsp, 0x8 : IntAlu :  D=0x00007fffffffe6b8
>>>> >>     >> 339068000: system.cpu + A0 T0 : 0x83d48d.4  :   CALL_NEAR_I :
>>>> >>     wrip   ,
>>>> >>     >> t7, t1 : IntAlu :
>>>> >>     >> 339068500: system.cpu.icache: set be: moving blk 452f80 to
>>>> >> MRU
>>>> >>     >> 339068500: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>>>> >>     >> 339068500: system.cpu + A0 T0 : 0x852f90    : mov    r10, rcx
>>>> >>     >> 339068500: system.cpu + A0 T0 : 0x852f90.0  :   MOV_R_R : mov
>>>> >>     r10,
>>>> >>     >> r10, rcx : IntAlu :  D=0x0000000000000022
>>>> >>     >> 339069000: system.cpu.icache: set be: moving blk 452f80 to
>>>> >> MRU
>>>> >>     >> 339069000: system.cpu.icache: ReadReq (ifetch) 452f90 hit
>>>> >>     >> 339069000: system.cpu + A0 T0 : 0x852f93    : mov    eax, 0x9
>>>> >>     >> 339069000: system.cpu + A0 T0 : 0x852f93.0  :   MOV_R_I :
>>>> >> limm
>>>> >>       eax,
>>>> >>     >> 0x9 : IntAlu :  D=0x0000000000000009
>>>> >>     >> 339069500: system.cpu.icache: set be: moving blk 452f80 to
>>>> >> MRU
>>>> >>     >> 339069500: system.cpu.icache: ReadReq (ifetch) 452f98 hit
>>>> >>     >>
>>>> >>     >>
>>>> >>     >> What is your opinion then?
>>>> >>     >> Regards,
>>>> >>     >>
>>>> >>     >> On 4/27/12, Steve Reinhardt <ste...@gmail.com
>>>> >>     <mailto:ste...@gmail.com>> wrote:
>>>> >>     >>> Also, if you do run valgrind, use the
>>>> >>     util/valgrind-suppressions file to
>>>> >>     >>> suppress spurious reports.  Read the valgrind docs to see
>>>> >> how
>>>> >> this
>>>> >>     >>> works.
>>>> >>     >>>
>>>> >>     >>> Steve
>>>> >>     >>>
>>>> >>     > _______________________________________________
>>>> >>     > gem5-users mailing list
>>>> >>     > gem5-users@gem5.org <mailto:gem5-users@gem5.org>
>>>> >>     > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>> >>     >
>>>> >>
>>>> >>
>>>> >>     --
>>>> >>     // Naderan *Mahmood;
>>>> >>     _______________________________________________
>>>> >>     gem5-users mailing list
>>>> >>     gem5-users@gem5.org <mailto:gem5-users@gem5.org>
>>>> >>     http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>> >>
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> gem5-users mailing list
>>>> >> gem5-users@gem5.org
>>>> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>> >
>>>> >
>>>>
>>>>
>>>> --
>>>> // Naderan *Mahmood;
>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> gem5-users@gem5.org
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>
>>>
>>
>>
>> --
>> // Naderan *Mahmood;
>>
>
>
> --
> // Naderan *Mahmood;
>


-- 
// Naderan *Mahmood;
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] problem with benchmarks which uses '<'

Reply via email to