Hi Ali,

I think the problem is somewhere in the compiler.. I removed a printf
statement from a location and rebuilt the kernel, that particular error
seemed to go off and the simulation ran for a much longer time, before it
got killed with "unhandled unaligned exception".

So I was trying to enable CONFIG_DEBUG_SLAB and run it again.. But there
seems to be some problem with this debug mode and the slab code.. This
comes only when it runs with more than one core..  The following bug stops
the simulation..

#if DEBUG
static void check_irq_off(void)
{
    BUG_ON(!irqs_disabled());
}

Do I have to setup something more while enabling the config_debug_slab
mode?

Thanks,
Pritha


On Wed, Jun 20, 2012 at 9:42 PM, Ali Saidi <sa...@umich.edu> wrote:

> **
>
> Hi Pritha,
>
>
>
> I seem to be missing something... so the ldq_u t2, 0(a0) is loading a
> bogus address. Did the a0 address get stored by the first stq?  If so, what
> value was stored by the stq? (You can use the debug-flags to figure that
> out). Was it the right value? There are still a couple of possible issues
> here: 1) kernel bug 2) compiler bug 3) gem5 bug. You need to trace the
> source of the value back as far as possible using a combination of what
> you've done and the exec debug flag. If the value was stored and later read
> and isn't the same, something has likely gone wrong with gem5.
> Unfortunately, it's also possible there is a bug with the compiler or
> kernel.
>
>
>
> Ali
>
>
>
> On 20.06.2012 18:01, Pritha Ghoshal wrote:
>
> Hi Ali,
>
> I have a different panic now(not sure about the old one, that is also
> there). I had modified e1000_clean_rx_irq function to check for the
> skb_dev_name and match it with eth0 and process the packet only if it
> matched, just for a check.. The panic comes in the first line of code in
> strcmp:
>  fffffc00004e8ff0:
> fffffc00004e8ff0:       00 00 70 2c     ldq_u   t2,0(a0)
> This is because a0($16) holds 000032b25000ada8 which is not a valid
> address. I tried to trace back when a0 was last loaded:
>          stq $16,168($30)         # adapter, adapter
>
>         stq $18,176($30)         # work_done, work_done
>
>         stq $19,184($30)         # work_to_do, work_to_do
>
> This is at the beginning of the function e1000_clean_rx_irq :
> static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
>                    struct e1000_rx_ring *rx_ring,
>                     int *work_done, int work_to_do)
> I am not sure how to fix this.. Is there a problem during compiling, a0
> should have been loaded but it is not? I followed the instructions in this
> site to match the assembly code and c code :
> http://kerneltrap.org/node/3648
> I added the assembly comments after each line to trace the flow of the
> code and made sure I went through all the parts of the code till before the
> strcmp call to check if a0 is loaded.. Do you have any suggestion about
> what I can do next?
>
> Thanks,
> Pritha
>
> On Tue, Jun 19, 2012 at 7:03 PM, Pritha Ghoshal 
> <pritha9...@neo.tamu.edu>wrote:
>
>> I was able to use 1 core with the remote gdb.. With the 4 cores though,
>> even after connecting remote gdb-s to each of the cores, I get the same
>> output even after a kernel panic:
>> (gdb) c
>> Continuing.
>> Watchdog has expired.  Target detached.
>> I am not able to get a backtrace on any of the connected gdb-s..
>> Pritha
>>
>> On Tue, Jun 19, 2012 at 2:38 PM, Ali Saidi <sa...@umich.edu> wrote:
>>
>>>  I think i missed that post, but you might need to connect 4 instances
>>> of gdb to the four cpus. This doesn't happen with 1, 2 or 3 cores?
>>>
>>>
>>>
>>> You can go to every cache and add code to the inbound port or dram port
>>> that has an explicit check on that address in the packet (cache block
>>> aligned). Every time it sees a read or write you should print out the fact
>>> that the write happened and at some point hopefully you'll find the bad
>>> piece of data.
>>>
>>>
>>>
>>> Ali
>>>
>>>
>>>
>>> On 19.06.2012 14:31, Pritha Ghoshal wrote:
>>>
>>> Hi Ali,
>>>
>>> I am having some troubles using the gdb on a 4 core machine (I had
>>> posted a previous mail to the group about that), I'll try it out once more
>>> and see..
>>>
>>> How could I add the memory checks?
>>>
>>> Thanks,
>>> Pritha
>>>
>>> On Tue, Jun 19, 2012 at 2:02 PM, Ali Saidi <sa...@umich.edu> wrote:
>>>
>>>>
>>>>
>>>> On 19.06.2012 13:06, Pritha Ghoshal wrote:
>>>>
>>>>  Hi,
>>>> I am getting a kernel panic which I am not able to debug. The pc itself
>>>> is getting polluted.. I have added the trace of the panic at the end of the
>>>> email.
>>>> This is a snippet from the object dump of the kernel code.
>>>>  fffffc00005d51e8:       00 00 69 a7     ldq     t12,0(s0)
>>>> fffffc00005d51ec:       00 40 5b 6b     jsr
>>>> ra,(t12),fffffc00005d51f0
>>>>   fffffc00005d51f0:       2a 00 ba 27     ldah    gp,42(ra)
>>>> The panic is when ra = fffffc00005d51f0. Therefore the jsr should have
>>>> jumped to the address in t12 which is 0000000002969588. t12 gets loaded
>>>> from s0 in the previous step. I was unable to trace back the memory address
>>>> content, is there a way to do it? The last function in the trace is given
>>>> in the following link:
>>>> http://lxr.free-electrons.com/source/net/core/neighbour.c?v=2.6.28#L1187
>>>> Could someone suggest how I go about debugging this kernel panic?
>>>> Thanks in advance..
>>>> Thanks,
>>>> Pritha
>>>>
>>>> You'll need to either use the gdb support in gem5 or maybe put some
>>>> checks in the memory system for that specific address and print as it gets
>>>> changed.
>>>> Ali
>>>>
>>>>
>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> gem5-users@gem5.org
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>>
>>>
>>>
>>
>
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to