Hi all,

I implemented support for %fs and %gs segment prefixes on the x86 and
x86-64 platforms, in what turns out to be a small patch.

For those not familiar with it, at least on x86-64, %fs and %gs are
two special registers that a user program can ask be added to any
address machine instruction.  This is done with a one-byte instruction
prefix, "%fs:" or "%gs:".  The actual value stored in these two
registers cannot quickly be modified (at least before the Haswell
CPU), but the general idea is that they are rarely modified.
Speed-wise, though, an instruction like "movq %gs:(%rdx), %rax" runs
at the same speed as a "movq (%rdx), %rax" would.  (I failed to
measure any difference, but I guess that the instruction is one more
byte in length, which means that a large quantity of them would tax
the instruction caches a bit more.)

For reference, the pthread library on x86-64 uses %fs to point to
thread-local variables.  There are a number of special modes in gcc to
already produce instructions like "movq %fs:(16), %rax" to load
thread-local variables (declared with __thread).  However, this
support is special-case only.  The %gs register is free to use.  (On
x86, %gs is used by pthread and %fs is free to use.)


So what I did is to add the __seg_fs and __seg_gs address spaces.  It
is used like this, for example:

    typedef __seg_gs struct myobject_s {
        int a, b, c;
    } myobject_t;

You can then use variables of type "struct myobject_s *o1" as regular
pointers, and "myobject_t *o2" as %gs-based pointers.  Accesses to
"o2->a" are compiled to instructions that use the %gs prefix; accesses
to "o1->a" are compiled as usual.  These two pointer types are
incompatible.  The way you obtain %gs-based pointers, or control the
value of %gs itself, is out of the scope of gcc; you do that by using
the correct system calls and by manual arithmetic.  There is no
automatic conversion; the C code can contain casts between the three
address spaces (regular, %fs and %gs) which, like regular pointer
casts, are no-ops.


My motivation comes from the PyPy-STM project ("removing the Global
Interpreter Lock" for this Python interpreter).  In this project, I
want *almost all* pointer manipulations to resolve to different
addresses depending on which thread runs the code.  The idea is to use
mmap() tricks to ensure that the actual memory usage remains
reasonable, by sharing most of the pages (but not all of them) between
each thread's "segment".  So most accesses to a %gs-prefixed address
actually access the same physical memory in all threads; but not all
of them.  This gives me a dynamic way to have a large quantity of data
which every thread can read, and by changing occasionally the mapping
of a single page, I can make some changes be thread-local, i.e.
invisible to other threads.

Of course, the same effect can be achieved in other ways, like
declaring a regular "__thread intptr_t base;" and adding the "base"
explicitly to every pointer access.  Clearly, this would have a large
performance impact.  The %gs solution comes at almost no cost.  The
patched gcc is able to compile the hundreds of MBs of (generated) C
code with systematic %gs usage and seems to work well (with one
exception, see below).


Is there interest in that?  And if so, how to progress?

* The patch included here is very minimal.  It is against the
gcc_5_1_0_release branch but adapting it to "trunk" should be
straightforward.

* I'm unclear if target_default_pointer_address_modes_p() should
return "true" or not in this situation: i386-c.c now defines more than
the default address mode, but the new ones also use pointers of the
same standard size.

* One case in which this patched gcc miscompiles code is found in the
attached bug1.c/bug1.s.  (This case almost never occurs in PyPy-STM,
so I could work around it easily.)  I think that some early, pre-RTL
optimization is to "blame" here, possibly getting confused because the
nonstandard address spaces also use the same size for pointers.  Of
course it is also possible that I messed up somewhere, or that the
whole idea is doomed because many optimizations make a similar
assumption.  Hopefully not: it is the only issue I encountered.

* The extra byte needed for the "%gs:" prefix is not explicitly
accounted for.  Is it only by chance that I did not observe gcc
underestimating how large the code it writes is, and then e.g. use
jump instructions that would be rejected by the assembler?

* For completeness: this is very similar to clang's
__attribute__((addressspace(256))) but a few details differ.  (Also,
not to discredit other projects in their concurrent's mailing list,
but I had to fix three distinct bugs in llvm before I could use it.
It contributes to me having more trust in gcc...)


Links for more info about pypy-stm:

* http://morepypy.blogspot.ch/2015/03/pypy-stm-251-released.html
* https://bitbucket.org/pypy/stmgc/src/use-gcc/gcc-seg-gs/
* https://bitbucket.org/pypy/stmgc/src/use-gcc/c8/stmgc.h


Thanks for reading so far!

Armin
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c      (revision 223859)
+++ gcc/config/i386/i386.c      (working copy)
@@ -15963,6 +15963,20 @@
          fputs (" PTR ", file);
        }
 
+      /**** <AR> ****/
+      switch (MEM_ADDR_SPACE(x))
+       {
+       case ADDR_SPACE_SEG_FS:
+         fputs (ASSEMBLER_DIALECT == ASM_ATT ? "%fs:" : "fs:", file);
+         break;
+       case ADDR_SPACE_SEG_GS:
+         fputs (ASSEMBLER_DIALECT == ASM_ATT ? "%gs:" : "gs:", file);
+         break;
+       default:
+         break;
+       }
+      /**** </AR> ****/
+
       x = XEXP (x, 0);
       /* Avoid (%rip) for call operands.  */
       if (CONSTANT_ADDRESS_P (x) && code == 'P'
@@ -51816,6 +51830,120 @@
 }
 #endif
 
+
+/***** <AR> *****/
+
+/*** GS segment register addressing mode ***/
+
+static machine_mode
+ix86_addr_space_pointer_mode (addr_space_t as)
+{
+  gcc_assert (as == ADDR_SPACE_GENERIC ||
+             as == ADDR_SPACE_SEG_FS ||
+             as == ADDR_SPACE_SEG_GS);
+  return ptr_mode;
+}
+
+/* Return the appropriate mode for a named address address.  */
+static machine_mode
+ix86_addr_space_address_mode (addr_space_t as)
+{
+  gcc_assert (as == ADDR_SPACE_GENERIC ||
+             as == ADDR_SPACE_SEG_FS ||
+             as == ADDR_SPACE_SEG_GS);
+  return Pmode;
+}
+
+/* Named address space version of valid_pointer_mode.  */
+static bool
+ix86_addr_space_valid_pointer_mode (machine_mode mode, addr_space_t as)
+{
+  gcc_assert (as == ADDR_SPACE_GENERIC ||
+             as == ADDR_SPACE_SEG_FS ||
+             as == ADDR_SPACE_SEG_GS);
+  return targetm.valid_pointer_mode (mode);
+}
+
+/* Like ix86_legitimate_address_p, except with named addresses.  */
+static bool
+ix86_addr_space_legitimate_address_p (machine_mode mode, rtx x,
+                                     bool reg_ok_strict, addr_space_t as)
+{
+  gcc_assert (as == ADDR_SPACE_GENERIC ||
+             as == ADDR_SPACE_SEG_FS ||
+             as == ADDR_SPACE_SEG_GS);
+  return ix86_legitimate_address_p (mode, x, reg_ok_strict);
+}
+
+/* Named address space version of LEGITIMIZE_ADDRESS.  */
+static rtx
+ix86_addr_space_legitimize_address (rtx x, rtx oldx,
+                                   machine_mode mode, addr_space_t as)
+{
+  gcc_assert (as == ADDR_SPACE_GENERIC ||
+             as == ADDR_SPACE_SEG_FS ||
+             as == ADDR_SPACE_SEG_GS);
+  return ix86_legitimize_address (x, oldx, mode);
+}
+
+/* The default, SEG_FS and SEG_GS address spaces are all "subsets" of
+   each other. */
+bool static 
+ix86_addr_space_subset_p (addr_space_t subset, addr_space_t superset)
+{
+  gcc_assert (subset == ADDR_SPACE_GENERIC ||
+             subset == ADDR_SPACE_SEG_FS ||
+             subset == ADDR_SPACE_SEG_GS);
+  gcc_assert (superset == ADDR_SPACE_GENERIC ||
+             superset == ADDR_SPACE_SEG_FS ||
+             superset == ADDR_SPACE_SEG_GS);
+  return true;
+}
+
+/* Convert from one address space to another: it is a no-op.
+   It is the C code's responsibility to write sensible casts. */
+static rtx
+ix86_addr_space_convert (rtx op, tree from_type, tree to_type)
+{
+  addr_space_t from_as = TYPE_ADDR_SPACE (TREE_TYPE (from_type));
+  addr_space_t to_as = TYPE_ADDR_SPACE (TREE_TYPE (to_type));
+
+  gcc_assert (from_as == ADDR_SPACE_GENERIC ||
+             from_as == ADDR_SPACE_SEG_FS ||
+             from_as == ADDR_SPACE_SEG_GS);
+  gcc_assert (to_as == ADDR_SPACE_GENERIC ||
+             to_as == ADDR_SPACE_SEG_FS ||
+             to_as == ADDR_SPACE_SEG_GS);
+
+  return op;
+}
+
+#undef TARGET_ADDR_SPACE_POINTER_MODE
+#define TARGET_ADDR_SPACE_POINTER_MODE ix86_addr_space_pointer_mode
+
+#undef TARGET_ADDR_SPACE_ADDRESS_MODE
+#define TARGET_ADDR_SPACE_ADDRESS_MODE ix86_addr_space_address_mode
+
+#undef TARGET_ADDR_SPACE_VALID_POINTER_MODE
+#define TARGET_ADDR_SPACE_VALID_POINTER_MODE ix86_addr_space_valid_pointer_mode
+
+#undef TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P
+#define TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P \
+  ix86_addr_space_legitimate_address_p
+
+#undef TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS
+#define TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS \
+  ix86_addr_space_legitimize_address
+
+#undef TARGET_ADDR_SPACE_SUBSET_P
+#define TARGET_ADDR_SPACE_SUBSET_P ix86_addr_space_subset_p
+
+#undef TARGET_ADDR_SPACE_CONVERT
+#define TARGET_ADDR_SPACE_CONVERT ix86_addr_space_convert
+
+/***** </AR> *****/
+
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h      (revision 223859)
+++ gcc/config/i386/i386.h      (working copy)
@@ -2568,6 +2568,11 @@
 /* For switching between functions with different target attributes.  */
 #define SWITCHABLE_TARGET 1
 
+enum {
+  ADDR_SPACE_SEG_FS = 1,
+  ADDR_SPACE_SEG_GS = 2
+};
+
 /*
 Local variables:
 version-control: t
Index: gcc/config/i386/i386-c.c
===================================================================
--- gcc/config/i386/i386-c.c    (revision 223859)
+++ gcc/config/i386/i386-c.c    (working copy)
@@ -572,6 +572,9 @@
                               ix86_tune,
                               ix86_fpmath,
                               cpp_define);
+
+  cpp_define (parse_in, "__SEG_FS");
+  cpp_define (parse_in, "__SEG_GS");
 }
 
 
@@ -586,6 +589,9 @@
   /* Update pragma hook to allow parsing #pragma GCC target.  */
   targetm.target_option.pragma_parse = ix86_pragma_target_parse;
 
+  c_register_addr_space ("__seg_fs", ADDR_SPACE_SEG_FS);
+  c_register_addr_space ("__seg_gs", ADDR_SPACE_SEG_GS);
+
 #ifdef REGISTER_SUBTARGET_PRAGMAS
   REGISTER_SUBTARGET_PRAGMAS ();
 #endif
typedef __seg_gs struct foo_s {
    int a[20];
} foo_t;


int sum1(foo_t *p)
{
    int i, total=0;
    for (i=0; i<20; i++)
        total += p->a[i];     // <= the %gs: prefix is correctly inserted
    return total;
}

int sum2(void)
{
    foo_t *p = (foo_t *)0x1234;
    int i, total=0;
    for (i=0; i<20; i++)
        total += p->a[i];     // <= this memory read is missing %gs:
    return total;
}

Attachment: bug1.s
Description: Binary data

Reply via email to