On Sun, Jan 3, 2010 at 10:55 PM, Joshua Haberman <jhaber...@gmail.com> wrote:
> Richard Guenther <richard.guenther <at> gmail.com> writes:
>> On Sun, Jan 3, 2010 at 6:46 AM, Joshua Haberman <jhaberman <at>
> gmail.com> wrote:
>> > The aliasing policies that GCC implements seem to be more strict than
>> > what is in the C99 standard.  I am wondering if this is true or whether
>> > I am mistaken (I am not an expert on the standard, so the latter is
>> > definitely possible).
>> >
>> > The relevant text is:
>> >
>> >  An object shall have its stored value accessed only by an lvalue
>> >  expression that has one of the following types:
>> >
>> >  * a type compatible with the effective type of the object,
>> >  [...]
>> >  * an aggregate or union type that includes one of the aforementioned
>> >    types among its members (including, recursively, a member of a
>> >    subaggregate or contained union), or
>>
>> Literally interpreting this sentence the way you do removes nearly all
>> advantages of type-based aliasing that you have when dealing with
>> disambiguating a pointer dereference vs. an object reference
>> and thus cannot be the desired interpretation (and thus we do not allow 
>> this).
>
> Thank you for the information.  I am very interested in distilling this
> issue into a concise and easy to understand guideline that C and C++
> programmers can use to determine whether they are following the rules
> correctly or not, especially since the warnings are not perfect.  The
> GCC manpage gives a basic rule:
>
>  In particular, an object of one type is assumed never to reside at the
>  same address as an object of a different type, unless the types are
>  almost the same.  For example, an "unsigned int" can alias an "int",
>  but not a "void*" or a "double".  A character type may alias any other
>  type.
>
> However, this explanation does not address how the rule applies to
> aggregates (structures and arrays) and unions.  Here is my attempt;
> please correct anything that looks wrong.
>
> The best way I have had this explained to me so far is that
> dereferencing "upcasted" pointers is ok, but "downcasted" pointers not.
> For the purposes of this explanation only, we define "upcasts" and
> "downcasts" as:
>
>  struct A { int x; } a;
>  int i;
>
>  int *pi = &a.x;  // upcast
>  int foo = *pi;   // ok
>
>  struct A *pa = (struct A*)&i;  // downcast
>  int bar = pa->x;    // NOT ok
>  struct A a2 = *pa;  // NOT ok
>
> A distinguishing feature of the downcast is that it requires an actual
> cast.  So in general, casts from one pointer type to another indicate
> a likely problem.  Pointer casts *can* be valid, but only if you know
> that the object was previously written as the casted-to type:
>
>  struct A { int x; } a;
>  int i;
>
>  int *pi = &a.x;  // upcast
>  // this downcast is just "undoing" the previous upcast.
>  struct A *pa = (struct A*)&i;
>  int foo = pa->x;  // ok
>
> This is why perfect warnings about this issue are not possible; if we
> see a downcast in isolation, we do not know if it is undoing a previous
> upcast or not.  Only a tool like valgrind could check this perfectly, by
> observing reads and writes at runtime and checking the types of pointers
> that were used to perform the read/write.

Correct (though valgrind operates at a too low level to know access types).

> It is possible in C (not C++) to run into trouble even without pointer
> casts, since void* can assign to any pointer type without a cast:

void * is the same as any pointer to incomplete type, like
struct A;
(struct A *)&i;

the good thing is that you cannot dereference this kind of pointers.

>  int i;
>  void *voidp = &i;
>  // Effective downcast.
>  struct A *pa = voidp;
>  int foo = pa->x;  // NOT ok
>
> But since chars can alias anything, it is always allowed to read or
> write an object's representation via char*.
>
>  int i;
>  char ch = *(char*)&i;  // ok

correct.

>  char charray[sizeof(long)] = {...};
>  long l = *(long*)charray;  // ok

not correct ;)  (the lvalue has to be of character type, yours is of
type 'long' - the type of the actual object does not matter)

> This does not mean that casts to/from char* are always safe, for the
> same reason that we have to watch out for void*: the object may have
> previously been written as a different type.
>
> Besides observing the upcast/downcast rule, the other major rule is that
> pointers to union members may only be dereferenced for the *active*
> union member, which can only be set by using the union directly.
>
>  union U {
>    int i;
>    long l;
>  } u;
>  int *pi = &u.i;
>  long *pl = &u.l;
>
>  u.i = 5;
>  int foo = *pi;   // ok, u.i is the active member.
>  long bar = *pl;  // NOT ok, u.l is not the active member.

Correct.  C++ has the notion of dynamic types, so with C++

int i;
*(foat *)&i = 0.0;
float f = *(float *)&i;

is ok (well - it's ok with a placement new, but a pointer cast is all
GCC sees here).  The store changes the dynamic type of the
memory stored to and thus further reads are only valid using
the same type.  GCC implements this also for C, but only starting
with GCC 4.5.

Richard.

> Josh
>
>

Reply via email to