Re: Request for comments on language extension: Safe arrays and pointers for C.

John Nagle Sat, 01 Sep 2012 20:35:16 -0700

On 9/1/2012 9:59 AM, James Dennett wrote:
> On Fri, Aug 31, 2012 at 2:55 PM, John Nagle <na...@animats.com> 
> wrote:
>> We have proposed an extension to C (primarily) and C++ (possibly) 
>> to address buffer overflow prevention.  Buffer overflows are still 
>> a huge practical problem in C, and much important code is still 
>> written in C.  This is a new approach that may actually work.
...
> Could you say a little more of why it appears necessary to introduce 
> references into C for this?  The reason I'm puzzled is that C already
> has the ability to pass arrays in a way that preserves their size
> (just pass the address of the array) -- what is it that references
> change in this picture that justifies such a radical change?  Could
> we just permit pointer-to-array-of-n elements to convert to
> pointer-to-array-of-(n-1) elements, and/or provide some way to slice
> explicitly?


   That's an important point.  C99 already has variable-length
array parameters:

        int fn(size_t n, float vec[n]);

Unfortunately, when the parameter is received in the function body,
per N1570 §6.7.6.3p7: 'A declaration of a parameter as "array of _type_"
shall be adjusted to "qualified pointer to _type_", where the type
qualifiers (if any) are those specified within the [ and ] of
the array type derivation.'

What this means is that, in the body of the function,
"vec" has type "float *", and "sizeof vec" is the size of
a pointer.  The standard currently requires losing the size
of the array.

While C99 variable-length array parameters aren't used much
(searches of open-source code have failed to find any use
cases, Microsoft refuses to implement them, and N1570 makes
them optional), these semantics also apply to passing
fixed-length arrays:

        int fn(float vec[4]);

As before, "vec" is delivered as "float* vec".  The constant
case is widely used, and changing the semantics there might silently
break existing code that uses "sizeof".  We had a go-round on
this on comp.std.c, and the conclusion was that changing the
semantics of C array passing would break too much.

The real reason for using references is that size information
is needed in other places than parameters.  It's needed in
return types, on the left side of assignments, in casts, and
in structures.  References to arrays have associated information;
pointers don't.

As for slicing, see "array_slice" in the paper.  It's not a
built-in; it's a macro that uses "decltype" and a cast to
generate the appropriate result type.  Personally, I'd
like to have a Python-like slicing notation:

        arr[start:endplus1]

but that's not essential to the proposal, so I'm not suggesting it.

> Of course to make this succeed you'll need buy-in from implementors 
> and of the standards committee(s), who will need to trust that the 
> other (and therefore that users) will find this worth the cost.  It 
> generally takes a lot of work (in terms of robust specification and 
> possibly implementation in a fork of an open source compiler or two) 
> to generate the consensus necessary for a proposal to succeed. 
> Something that might ultimately seek to change or even disallow much 
> existing C code has an even higher bar -- getting an ISO committee to
> remove existing support is no small achievement (e.g., look at how 
> long gets() persisted). I'd love to see a reduction in the number of 
> buffer overruns that are present in code, but it's an uphill 
> struggle.

Of course.  Support may come from the security community.  CERT
still reports buffer overflows, usually in C/C++ code, as the single
biggest source of vulnerabilities.  Vulnerabilities in software are now
a public policy level issue.  In the last week, software attacks have
taken down Saudi Aramco and RasGas, two of the world's largest energy
producers.  This issue is growing in importance as "info-war" moves
from a potential threat to reality.  It's now something that has to
be fixed.

                                John Nagle

Re: Request for comments on language extension: Safe arrays and pointers for C.

Reply via email to