Re: [Tinyos-help] Why Node fails?

Eric Decker Tue, 29 Oct 2013 14:20:14 -0700

On Tue, Oct 29, 2013 at 1:48 PM, Xiaoyang Zhong <[email protected]>wrote:


> Hi Eric,
>
> Thank you very much for your reply.
>
> We have thought about the event logging you mentioned in the email, and
> have couple of questions:
>
> 1. The event log should be stored in flash, is this correct? If it is
> stored, how can I read it?
>

That is a problem.  On my development boards we have to physically
recapture the devices so we can read the logs.   On the development boards
we can physically remove the SD card that we use for this an read it on
another host.


> 2. There is an event logging component in Ctp, we are trying to use it and
> see whether it can solve part of our problems. Do you have any advice of
> using it?
>

I haven't looked at that.  So don't know.


>
> In today's tests we are using c-print to record some events in the
> application layer. We found the sendDone event had not been signaled in the
> lower layer when node failed.
>

That sounds like a very suspicious finding.    In event driven
architectures, if one loses an event the whole thing can easily shut down.

When I build this kind of stuff, I write reasonably paranoid code and build
in mechanisms that can detect this kind of thing and then blow up and Panic
(that's the mechanism that kicks the logging to save state and then reboot).

It isn't very generic.   So you'd have to roll your own.

But sounds like a good start.   Now the question is why doesn't the
sendDone get signalled.


I also have an in memory trace/log facility that I use for watching
transitions between state etc.   so I can see what the state machines are
doing….

Take a look at…    https://github.com/MamMark/mm/tree/master/tos/system
in particular Trace*

It needs Panic but you can probably just comment that stuff out.

If you want Panic and such you should probably switch over to my main
development repository.   It is quite a bit further along than the main
tinyos-main repos.    Its
gh:tp-freeforall/prod<https://github.com/tp-freeforall/prod>.
  It supports the main new msp430 chips, in particular the 5438a.



>
> Thanks again.
>
> Xiaoyang
>
>
> On Tue, Oct 29, 2013 at 1:31 AM, Eric Decker <[email protected]> wrote:
>
>>
>> Sorry I'm not going to be more help.
>>
>> But….     in real live depoloyments it is critical to have some mechanism
>> of doing real time logging.   This coupled with some degree of paranoid
>> code allows one to log information and/or take a memory (cpu state) dump
>> for analysis when thing go wrong like you are talking about.
>>
>> This needs to be on board and should rely on a highly robust kernel of
>> code.  I've built this into my motes but that doesn't help what you are
>> doing.
>>
>> sorry.   I've been doing this kind of work (embedded) for 20+ years and
>> unless the node is architected to assist in debugging these kinds of
>> problems it is very very difficult to figure out what is going on when the
>> node goes catatonic.
>>
>>
>>
>> On Mon, Oct 28, 2013 at 8:16 PM, Xiaoyang Zhong <[email protected]>wrote:
>>
>>> Hi all,
>>>
>>> Our group is writing an application for outdoor testbed using Ctp and
>>> Deluge. We are using micaz and iris motes. Before deployment, we want to do
>>> as much test as possible to make sure the application is working fine so
>>> that least maintenance is needed when deployed. It is working well most of
>>> the time, but we encountered node failure for certain situations (we repeat
>>> the tests several times, and node fails every time).
>>>
>>> We are trying to simulate the situation in tree routing that, if the
>>> bottleneck node fails and the rest network cannot find a valid routing to
>>> base station, the rest of the network should keep running until the
>>> bottleneck node is recovered.
>>>
>>> Our method is to turn off the base station, let the rest network running
>>> for a time interval (the interval we have tested ranges from tens of
>>> minutes to overnight), then turn on the base station again. We hope all the
>>> node can reconnect to the base station.
>>>
>>> In every test, one IRIS node would fail to reconnect, and from the
>>> sniffer, we observed no packets from that node (no routing packet, no data
>>> packets, no deluge packets, no anything). We think the node is "dead", but
>>> we don't know what would cause this. Deluge packets are not a problem,
>>> because we disabled deluge beacons at node booted event. And RAM size is
>>> also not a problem, our application is about 4K for IRIS node, while the
>>> RAM for IRIS node is 8K.
>>>
>>> Does anyone encountered a problem like this, a node suddenly stops
>>> working?
>>>
>>> Any help will be highly appreciated!
>>>
>>> Best,
>>> Xiaoyang
>>>
>>> _______________________________________________
>>> Tinyos-help mailing list
>>> [email protected]
>>> https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help
>>>
>>
>>
>>
>> --
>> Eric B. Decker
>> Senior (over 50 :-) Researcher
>>
>>
>


-- 
Eric B. Decker
Senior (over 50 :-) Researcher

_______________________________________________
Tinyos-help mailing list
[email protected]
https://www.millennium.berkeley.edu/cgi-bin/mailman/listinfo/tinyos-help

Re: [Tinyos-help] Why Node fails?

Reply via email to