<unlurks>

I have to jump in on this thread. Traffic light controllers are a fun category 
of technical artifacts. The weatherproof boxes that the relays used to live in 
have stayed the same size for decades, but now the controllers just take a 
teeny tiny circuit board rattling around in this comparatively huge box. And 
it's full of software, dontcha know? So why not have lots of newfangled 
features? Curiously, the people who make the insides of the box have a WHOLE 
DIFFERENT way of thinking about "what a traffic light controller should do?" - 
the "insider" people are in the 21st century, while the "outsider" people are 
in the early 20th century. Lemme splain.

A particular traffic light controller that I tested in 2007 had an FTP server 
inside it. I have no idea why. So I tried fuzzing it. 5 minutes into the test, 
the test aborted because the DuT wouldn't restart anymore. Upon investigation, 
we discovered that a particular FTP sequence had triggered a bug that had a 
rather unfortunate (side-)effect: The flash file system of the traffic light 
controller was formatted or erased. As a bonus, the device also had crashed and 
it was awaiting a ZMODEM file download since it didn't have a boot image any 
more. We couldn't test anything else because we didn't have the special serial 
cable to (re-)install the OS. Fail-safe? Not hardly: Not when it has no 
software! It's a lump of highly refined sand, in a plastic case.

There are many lessons here, not least of which is: Ship the device with the 
smallest possible attack surface! Why the heck was FTP enabled? Clearly this 
device had never been subjected to any negative testing. And these devices are 
meant to be networked, so that FTP bug will be tickled someday, I just don't 
know when. Yes, it was reported to the vendor, and no, I have no idea if they 
ever fixed it.

Also, in this thread I have seen several references to "fail-safe" or 
"redundancy" features. In my experience, those are often some of the weakest 
aspects of some systems. In one case, I my testing rendered a 
multi-million-dollar highly redundant VoIP soft switch useless by constantly 
causing the primary to fail - and while the secondary was being activated, 
there was a quiet period of 2-3 seconds during which time no calls went 
through. Shortly after the secondary had become the primary, it failed again, 
continuing the cycle. Literally traffic amounting to one packet (about 100 
bytes, IIRC) per second of carefully crafted SIP INVITES could make this switch 
completely useless. The bug I found involved SIP INVITE messages that could not 
be filtered…unless you didn't want to accept VoIP phone calls at all, which 
calls into question your purchase of the multi-million-dollar highly redundant 
soft switch. That bug was fixed.

Software is tricky stuff. The number of ways it can fail is practically 
infinite, but there is generally only a small number of ways for it to work 
correctly. Networked software is particularly challenging to write because the 
software engineers don't get to control their inputs. The intervening network 
can (does) fold, spindle, mutilate, truncate, drop, reorder or duplicate 
packets and your code on the receiving end has to try to understand what was 
intended by the sender. Oh, and the sender might be following an older version 
of the standard (if one even exists) or simply have included some bugs of their 
own. Because the coders are so focused on making their code do what the MRD/PRD 
required - on a tight schedule! - they have little time to imagine all the 
possible ways their code might fail. Their error-handling routines are simply 
never imaginative enough to handle real-world brokenness. It *is* possible to 
test this stuff, but time pressures in release schedules don't leave a lot of 
breathing room for developers to take on whole new classes of tasks that are 
outside their expertise (security testing). So you end up with a traffic light 
controller that erases its own flash file system when it receives a slightly 
strange but completely legal FTP command, or a highly redundant VoIP soft 
switch that is only good at ping-ponging from primary to secondary CPUs. Don't 
even get me started on problems I have found in carrier-class routers.

I don't need to name names: All software has bugs (except possibly the code in 
the main computers on the Space Shuttle). Every engineer I have ever known has 
tried to write their code well, but automated negative testing has only 
recently caught up to where the engineers and QA staff can focus on what they 
do best (write and test code that implements features that someone can buy), 
and let purpose-built tools do the negative testing for them, so their 
error-handling routines can be robust, too. Fixing bugs is generally 
straightforward. Finding them has always been the challenge.

~tom

</unlurks>


On 23 Nov 2011, at 17:59 , Brett Frankenberger wrote:

> On Wed, Nov 23, 2011 at 05:45:08PM -0500, Jay Ashworth wrote:
>> 
>> Yeah.  But at least that's stuff you have a hope of managing.  "Firmware
>> underwent bit rot" is simply not visible -- unless there's, say, signature 
>> tracing through the main controller.
> 
> I can't speak to traffic light controllers directly, but at least some
> vital logical controllers do check signatures of their firmware and
> programming and will fail into a safe configuration if the
> signatures don't validate.
> 
>     -- Brett
> 


Reply via email to