On Thu, 13 Mar 2008, chromatic via RT wrote:

> On Thursday 13 March 2008 09:14:07 Andy Dougherty wrote:
> 
> > On Thu, 13 Mar 2008, Nicholas Clark via RT wrote:
> 
> > > Specifically, I am suspecting that if
> > >
> > >     offsetof(struct parrot_string_t, bufused) == sizeof(Buffer)
> > >
> > > matters, then something is either looking at or copying (sub)structures
> > > than happen to have padding, and in turn that padding happens to end up
> > > with bit patterns that have meaning in some other, larger (containing?)
> > > structure.
> 
> > Yes.  That's exactly my suspicion.  Strings are stored in "bufferlike"
> > pools, and many of the mainpulations in src/headers.c involve
> > sizeof(Buffer), even though there is no actual "Buffer" inside a string
> > anymore.  To be fair, though, there's a *lot* more going on in parrot's
> > memory management that I just don't understand, and I have been unable
> > to pinpoint a specific assignment that is in error.
> 
> Originally I was going to ask "Why would there be padding at the end of a 
> Buffer?" but now I realize that the real question is "Is there padding in 
> parrot_string_t between flags and strstart?"

Actually, you should ask both questions.

There really can be padding at the end of Buffer.  The start of
every Buffer needs to be suitably aligned.  If you have an array of
Buffers, each one will occupy sizeof(Buffer).  The only way to
accomplish that is to have padding on the end of Buffer.

Also, no, there is no padding in parrot_string_t betwen flags and
strstart.  (At least on SPARC.)

The net result of these two answers is that

    offsetof(struct parrot_string_t, strstart) != sizeof(Buffer)

(assuming the current order of elements inside include/parrot/pobj.h,
in which strstart comes right after flags.)

Consider the following program (I've changed the parrot-specific types to 
their generic equivalents just so it's easier to compile).  The "Nested"
structure is what parrot used to have.  It now has the "Flat" structure.

    #include <stdio.h>
    #include <stddef.h>

    typedef union UnionVal {
        struct _b {                     /* One Buffer structure */
            void *     _bufstart;
            size_t     _buflen;
        } _b;
        struct _ptrs {          /* or two pointers, both are defines */
            void * _struct_val;
            double *      _pmc_val;
        } _ptrs;
        struct _i {
            int _int_val;               /* or 2 intvals */
            int _int_val2;
        } _i;
        double _num_val;            /* or one float */
        void * _string_val;     /* or a pointer to a string */
    } UnionVal;

    /* Parrot Object - base class for all others */
    typedef struct Buffer {
        UnionVal u;
        unsigned int flags;
    } Buffer;

    typedef struct Nested {
        Buffer       o;
        int      foo;
    } Nested_t;

    typedef struct Flat {
        UnionVal     cache;
        unsigned int flags;
        int      foo;
    } Flat_t;

    int main(int argc, char **argv)
    {
        printf("sizeof UnionVal = %d\n", sizeof(UnionVal));
        printf("sizeof flags = %d\n", sizeof(int));
        printf("sizeof Buffer = %d\n", sizeof(Buffer));

        printf("offsetof(Nested_t, foo) = %d\n", offsetof(Nested_t, foo));
        printf("offsetof(Flat_t, foo)   = %d\n", offsetof(Flat_t,   foo));

        return 0;
    }

On Linux/x86, the output of this is

    sizeof UnionVal = 8
    sizeof flags = 4
    sizeof Buffer = 12
    offsetof(Nested_t, foo) = 12
    offsetof(Flat_t, foo)   = 12

On SPARC, the output of this is

    sizeof UnionVal = 8
    sizeof flags = 4
    sizeof Buffer = 16
    offsetof(Nested_t, foo) = 16
    offsetof(Flat_t, foo)   = 12

> It looks like the UnionVal is two pointers long, so if we rearranged things 
> such that flags comes first, would the Buffer structure get padded so that 
> anything after that in memory starts at the appropriate alignment for a 
> pointer?

There is no "Buffer" structure anymore inside a string.  However, if you
switched the "Flat" structure so that the flags came first and UnionVal
came second, then the compiler might stick padding inside the string
structure so that UnionVal is aligned.  This extra padding would indeed
change the SPARC output to

    sizeof flags = 4
    sizeof UnionVal = 8
    sizeof Buffer = 16
    offsetof(Nested_t, foo) = 16
    offsetof(Flat_t, foo)   = 16

so that once again 'foo' would be at the same position in either the
Nested or the Flat versions.  (Of course all code that assumes that
PObj_bufstart() points to the beginning of any "bufferlike" object
would have to be changed, but that's a separate issue.)

Of course this throws away the space savings obtained by getting rid of
the nested structure without regaining any of the benefits of the Nested
structure.

More generally, you can rely on the compiler to ensure that elements
within a structure are suitably aligned for their declared uses.  Inside
parrot_string_t, strstart is already suitably aligned for a pointer.
Any padding required is automatically supplied by the compiler.
Where it matters is when you try to use the memory allocated for one
structure in the place of another structure.  Then you have to be sure
the structures agree on things.  (And in a virtual machine, you
naturally end up doing stuff like that.)

So, to return to my original point:

> > Strings are stored in "bufferlike"
> > pools, and many of the mainpulations in src/headers.c involve
> > sizeof(Buffer), even though there is no actual "Buffer" inside a string

I don't know if those calculations are still correct, now that strings
are not "bufferlike".

-- 
    Andy Dougherty              [EMAIL PROTECTED]

Reply via email to