Hi,

(tl;dr: skip to “questions from me to the GCC developers”)

I’ve recently spotted the following in some code I inherited:

struct foo {
        char name[4];
        uint8_t len;
        uint8_t type;
};

static const struct foo fooinfo[] = {
        /* list of 43 members */
};

I’ve seen the obvious two-byte padding (on i386, with -Os)
and thought to restructure this into:

static const char fooname[][4] = { … };
static const uint8_t foolen[] = { … };
static const uint8_t footype[] = { … };

Colour me surprised when this made the code over fifty
bytes longer. After some debugging, I found that the
assembly code generated had “.align 32” in front of
each of the new structs.

After some (well, lots) more debugging, I eventually
discovered -fdump-translation-unit (which, in the version
I was using, also worked for C, not just C++), which showed
me that the alignment was 256 even (only later reduced to
32 as that’s the maximum alignment for i386).

Lots of digging later, I found this gem in gcc/config/i386/i386.c:

int
ix86_data_alignment (tree type, int align)
{
  if (AGGREGATE_TYPE_P (type)
       && TYPE_SIZE (type)
       && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
       && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
           || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
    return 256;

Voilà, there we have my culprit – commenting this out resulted
in a 12-byte yield (not 42*2 byte, as the code generated for e.g.
“strncmp(foo, opname[i], (size_t)oplen[i])” is a bit less optimal
than for “strncmp(foo, opinfo[i].name, (size_t)opinfo[i].len)”,
but that’s okay)… and a 206-byte reduction for the rest of the
codebase.

Seeing that ix86_data_alignment() also contains amd64-specific
alignment, and that MMX stuff generally needs more alignment,
I first did this:

        && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
-          || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
+          || TREE_INT_CST_HIGH (TYPE_SIZE (type)))
+       && (TARGET_MMX || !optimize_size)
+       && align < 256)
     return 256;

The idea here being that both TARGET_SSE and TARGET_64BIT
enable TARGET_MMX, and to do this only for -Os.

Then I went into the svn history for this function and
discovered that its predecessor in gcc/config/i386/i386.h
(the DATA_ALIGNMENT(TYPE, ALIGN) macro) was added in around
2000, before MMX was even a thing, to “improve floating point
performance”, but that architectures apparently can do without.

Now I’m trying roughly this:

[…]
 ix86_constant_alignment (tree exp, int align)
 {
+  if (optimize_size && !TARGET_MMX)
+    return align;
[…]
 ix86_data_alignment (tree type, int align)
 {
+  if (optimize_size && !TARGET_MMX)
+    return align;
[…]
 ix86_local_alignment (tree type, int align)
 {
+  if (optimize_size && !TARGET_MMX)
+    return align;
[…]

This opens up some questions from me to the GCC developers
though:

– Is this safe to do? (My baseline here is 3.4.6, so
  if someone still remembers, please do answer, but
  the scope of this eMail in total goes beyond that.)

– Is this something that GCC trunk could benefit from?

– I’ve also been wondering whether this applies to
  regular strings (not arrays that technically are
  strings too) as well…

– Is the exclusion of MMX and 64BIT required? (Since
  this code has been there “ever” since even before
  MMX support landed in GCC, I fear that some of the
  “required alignment” are done inside this function
  instead of in other places.)

– Even better: is this something we could do for *all*
  platforms in general? Something like this, in gcc/varasm.c:

 #ifdef DATA_ALIGNMENT
+      if (!optimize_size)
-      align = DATA_ALIGNMENT (TREE_TYPE (decl), align);
+        align = DATA_ALIGNMENT (TREE_TYPE (decl), align);
 #endif

My aim here is to tighten the density (reduce the size
of the individual sections in the .o file and, ideally,
the file size of the final executable) of the generated
code for -Os while not breaking anything, and leaving
the case of not-Os completely alone.

Of course I’ll do a full rebuild of MirBSD (which uses
-Os in almost all code, only some legacy crap from the
1970s like AT&T nroff uses -O1 or even -O0 as the code
doesn’t conform to ISO C) to see if things break, but
I’m also interested in the bigger picture, besides I
have invested into embedded systems (FreeWRT/OpenADK,
but also dietlibc, klibc, etc.) which love small code.

Thanks in advance,
//mirabilos
PS:  Please do Cc me, I’m not subscribed.
PPS: I’ve exchanged assignment papers with the FSF about GCC,
     so feel free to just commit anything, if it makes sense.
-- 
FWIW, I'm quite impressed with mksh interactively. I thought it was much
*much* more bare bones. But it turns out it beats the living hell out of
ksh93 in that respect. I'd even consider it for my daily use if I hadn't
wasted half my life on my zsh setup. :-) -- Frank Terbeck in #!/bin/mksh

Reply via email to