On Thu, 2013-05-23 at 11:59 -0400, David Malcolm wrote: > On Thu, 2013-05-23 at 06:56 -0400, David Malcolm wrote: > > On Thu, 2013-05-23 at 07:14 +0200, Jakub Jelinek wrote: > > > On Wed, May 22, 2013 at 08:45:45PM -0400, David Malcolm wrote: > > > > I'm attempting to eliminate global state from the insides of gcc. > > > > > > > > gcc/tracer.c has various global variables, which are only used during > > > > the lifetime of the execute callback of that pass, and cleaned up at the > > > > end of each invocation of the pass. > > > > > > > > The attached patch introduces a class to hold the state of the pass > > > > ("tracer_state"), eliminating these globals. An instance of the state > > > > is created on the stack, and all of the various "static" functions in > > > > tracer.c that used the globals become member functions of the state. > > > > Hence the state is passed around by the implicit "this" of the > > > > tracer_state, avoiding the need to patch each individual use of a field > > > > within the state, minimizing the diff. > > > > > > But do we want to handle the global state this way? This adds overhead > > > to (almost?) every single function (now method) in the file (because it > > > gets > > > an extra argument). While that might be fine for rarely executed > > > functions, > > > if it involves also hot functions called many times, where especially on > > > register starved hosts it could increase register pressure, plus the > > > overhead of passing the this argument everywhere, this could start to be > > > noticeable. Sure, if you plan to do that just in one pass (but, why > > > then?), > > > it might be tiny slowdown, but after you convert the hundreds of passes in > > > gcc that contain global state it might become significant. > > > > > > There are alternative approaches that should be considered. > > > > I thought of a possible way of doing this, attached is a > > proof-of-concept attempt. > > > > The idea is to use (and then not use) C++'s "static" syntax for class > > methods and fields. By making that optional with a big configure-time > > switch, it gives us a way of making state be either global vs on-stack, > > with minimal syntax changes. In one configuration (for building gcc as > > a library) there would be implicit this-> throughout, but in the other > > (for speedy binaries) it would all compile away to global state, as per > > the status quo. > > > > This assumes that doing: > > > > tracer_state state; > > changed = state.tail_duplicate (); > > > > is legitimate; when using global state, "state" is empty, and the call > > to > > state.tail_duplicate () > > becomes effectively: > > state::tail_duplicate () > > since it's static in that configuration. > > > > > E.g. global state of a pass can be moved into a per-pass structure, > > > and have some way how to aggregate those per pass structures together from > > > all the passes in the whole compiler (that can be either manual process, > > > say each pass providing its own *-passstate.h and one big header including > > > all that together), or automatic ones (say gengstate or a new tool could > > > create those for us from special markings in the source, say new option on > > > GTY or something) and have some magic macro how to access the global state > > > within the pass (thispass->fieldname ?). Then e.g. depending on how the > > > compiler would be configured and built, thispass could be just address of > > > a > > > pass struct var (i.e. essentially keep the global state as is, for > > > performance reasons), or when trying to build compiler as a library (with > > > -fpic overhead we probably don't want for cc1/cc1plus - we can build all > > > the > > > *.o files twice, like libtool does) thispass could expand to __thread > > > pointer var dereference plus a field inside of the global compiler state > > > structure it points to for the current pass. Thus, the library version > > > of the compiler would be somewhat slower (both -fpic overhead and TLS > > > overhead), and would need either a few of the entrypoints tweaked to > > > adjust > > > the TLS pointer to the global state, or we could require users to just > > > call > > > a special function to make the global state current in the current thread > > > before calling compiler internals. > > > > Thanks. Though I thought we were trying to move away from relying on > > GTY parsing? (Sorry not to be able to answer more fully yet, need to > > get family ready for school...) > > I've warmed to your idea of having tooling to support state, and > creating a generic framework for this. For example, consider the > (long-term) use-case of embedding GCC's code as a library inside a > multithreaded app, where each thread could be JIT-compiling say OpenGL > shader code to machine code (perhaps with some extra non-standard > passes). To get there, I'd need to isolate *all* of GGC's state, and > when I look at, say, the garbage-collector, I shudder. > > So I'm interested in writing some compile-time tooling for generic > state-management in GCC, to sidestep the conflict between speed in the > status quo single state case vs support for multiple states. > > Here's a possible way of doing it: > > Given e.g. gcc/foo.c containing some state: > static some_type bar; > > then replace this with: > > DEFINE_STATE(some_type, bar); > > which is trivial to parse, and have a preprocessing tool autogenerate a > header in the builddir: > gcc/foo.c.state.h > that conditionally has something like this: > > #if SUPPORT_MULTIPLE_STATES > struct foo_c_state > { > some_type bar; > }; > # define bar ctxt->x_foo_c_bar; > /* there's a "context *ctxt;" variable somewhere, possibly > using TLS */ > > /* alternatively, we could do this as: > ctxt.x_foo_c_bar > and make changing context be an operation involving a memcpy, so the > single global ctxt gets its state overwritten by a copy. */ > > #else /* the single state case */ > > /* Just a global, as before: */ > some_type bar; > > #endif > > That way, any code that uses "bar" has equal meaning (after > preprocessing) in the !SUPPORT_MULTIPLE_STATES case to the status quo, > and thus ought to retain the performance for that use case. > > Probably need a DECLARE_STATE() macro also. > > (I didn't want to reuse GTY() since there's plenty of global state that > isn't GC-managed, and IIRC the plan is to move away from GTY) > > I can try prototyping this, if this approach sounds reasonable. Is this > the sort of thing that would be best done as a branch? (e.g. a feature > branch in git). > > BTW, I'm not convinced that TLS is the way to go for isolating these > details: what if someone wanted to parallelize a slow pass by > introducing a thread pool, optimizing each function on a different > thread? I think it's reasonable to require a specific API call on the > part of client code when changing context (OpenGL works this way iirc, > though it *does* use TLS). But I suppose that if the > DECLARE/DEFINE_STATE macros are in place, it becomes easier to > experiment with various implementations of state handling. > > In a similar vein, would "universe" be a better name than "context"? > "context" suggests threads to some people, whereas what I have in mind > is the ability to have multiple clients of a future gcc-as-a-library, > each client having the ability to have "parallel universes" of gcc > state; some clients might spread themselves across several threads, but > want to see the same universe. If GTY/PCH etc are layered on top of the > state management code, then clearly you can't share ptrs between states, > and the "parallel universe" metaphor is perhaps clearer. > > Hope this is constructive (and that I'm vaguely on the right track here) > Dave
I wrote a crude prototype of this, and am attaching in patch form (which includes a mix of handwritten and generated files for ease of seeing what's going on). It uses a Python script (gcc/genstate.py) to parse the state from .c files (run by hand). I converted tracer.c and ggc-page.c It compiles, but doesn't yet run (I haven't written the initialization code). The universe type's fields are *pointers* to the underlying state structs to avoid having it depend on every header file, and to avoid having to expose previously-private types; the generated headers are only used by their corresponding file. All that's exposed of each state struct is its sizeof, so that universe instances can malloc these at startup. It doesn't handle state that has initializers (e.g. the pass instances). Dave
diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 63d114b..5f1bc0c 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1461,6 +1461,7 @@ OBJS = \ tree-vectorizer.o \ tree-vrp.o \ tree.o \ + universe.o \ valtrack.o \ value-prof.o \ var-tracking.o \ diff --git a/gcc/genstate.py b/gcc/genstate.py new file mode 100644 index 0000000..503c07d --- /dev/null +++ b/gcc/genstate.py @@ -0,0 +1,117 @@ +#!/usr/bin/env python +from collections import namedtuple +import re +import sys + +open_paren = r'\(' +close_paren = r'\)' +opt_ws = r'\s*' +grp_tokens = '(.+)' + +class StaticState(namedtuple('StaticState', ('kind', 'name'))): + PATTERN = \ + ('DEFINE_STATIC_STATE' + open_paren + opt_ws + + grp_tokens + opt_ws + ',' + opt_ws + + grp_tokens + opt_ws + close_paren) + pattern = re.compile(PATTERN) + + def write_struct_field(self, f): + f.write(' %s x_%s;\n' % (self.kind, self.name)) + +class StaticStateArray(namedtuple('StaticStateArray', ('kind', 'name', 'nelt'))): + PATTERN = \ + ('DEFINE_STATIC_STATE_ARRAY' + open_paren + opt_ws + + grp_tokens + opt_ws + ',' + opt_ws + + grp_tokens + opt_ws + ',' + opt_ws + + grp_tokens + opt_ws + close_paren) + pattern = re.compile(PATTERN) + + def write_struct_field(self, f): + f.write(' %s x_%s[%s];\n' % (self.kind, self.name, self.nelt)) + +class SourceFile: + def __init__(self, path): + self.path = path + self.vars = [] + with open(path) as f: + for line in f: + m = StaticState.pattern.match(line) + if m: + self.vars.append(StaticState(*m.groups())) + m = StaticStateArray.pattern.match(line) + if m: + self.vars.append(StaticStateArray(*m.groups())) + + @property + def struct_name(self): + return 'state_%s_def' % self.field_name + + @property + def field_name(self): + """ + Get the fieldname for this source file within (struct universe_def) + """ + result = self.path + for ch in '.-': + result = result.replace(ch, '_') + return result + + def write_struct(self, f): + f.write('struct %s\n' % self.struct_name) + f.write('{\n') + for var in self.vars: + var.write_struct_field(f) + f.write('};\n') + + def write_macros(self, f): + for var in self.vars: + f.write('#define %s \\\n' % var.name) + f.write(' universe.%s->x_%s\n' % (self.field_name, var.name)) + + def write_sizeof_decl(self, f): + f.write('extern size_t get_sizeof_%s ();\n' % self.struct_name) + + def write_sizeof_impl(self, f): + f.write('size_t get_sizeof_%s ()\n' % self.struct_name) + f.write('{\n') + f.write(' return sizeof (%s);\n' % self.struct_name) + f.write('}\n') + + def write_private_header(self, f): + warn_autogenerated(f) + f.write('#if !GLOBAL_STATE\n') + sf.write_struct(f) + f.write('\n') + sf.write_sizeof_impl(f) + f.write('\n') + sf.write_macros(f) + f.write('\n') + f.write('#endif /* !GLOBAL_STATE */\n') + +def warn_autogenerated(f): + f.write('/* This file is autogenerated (by genstate.py);' + ' do not edit! */\n') + +def write_universe_header(sfs, f): + warn_autogenerated(f) + f.write('#if !GLOBAL_STATE\n') + f.write('struct universe_def\n') + f.write('{\n') + for sf in sfs: + f.write(' struct %s *%s;\n' % (sf.struct_name, sf.field_name)) + f.write('};\n\n') + f.write('/* Functions for allocating state: */\n') + for sf in sfs: + sf.write_sizeof_decl(f); + f.write('#endif /* !GLOBAL_STATE */\n') + +sfs = [] +for path in ['tracer.c', 'ggc-page.c']: + sf = SourceFile(path) + sfs.append(sf) + with open(path + '.state.h', 'w') as f: + sf.write_private_header(f) + sf.write_private_header(sys.stdout) +with open('universe.h', 'w') as f: + write_universe_header(sfs, f) + write_universe_header(sfs, sys.stdout) diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c index 5b18468..ba2875e 100644 --- a/gcc/ggc-page.c +++ b/gcc/ggc-page.c @@ -224,24 +224,27 @@ static const size_t extra_order_size_table[] = { #define PAGE_ALIGN(x) (((x) + G.pagesize - 1) & ~(G.pagesize - 1)) +#include "global-state.h" +#include "universe.h" + /* The Ith entry is the number of objects on a page or order I. */ -static unsigned objects_per_page_table[NUM_ORDERS]; +DEFINE_STATIC_STATE_ARRAY(unsigned, objects_per_page_table, NUM_ORDERS) /* The Ith entry is the size of an object on a page of order I. */ -static size_t object_size_table[NUM_ORDERS]; +DEFINE_STATIC_STATE_ARRAY(size_t, object_size_table, NUM_ORDERS) /* The Ith entry is a pair of numbers (mult, shift) such that ((k * mult) >> shift) mod 2^32 == (k / OBJECT_SIZE(I)) mod 2^32, for all k evenly divisible by OBJECT_SIZE(I). */ -static struct +struct inverse_table_def { size_t mult; unsigned int shift; -} -inverse_table[NUM_ORDERS]; +}; +DEFINE_STATIC_STATE_ARRAY(inverse_table_def, inverse_table, NUM_ORDERS) /* A page_entry records the status of an allocation page. This structure is dynamically sized to fit the bitmap in_use_p. */ @@ -343,7 +346,7 @@ struct free_object #endif /* The rest of the global variables. */ -static struct globals +struct globals { /* The Nth element in this array is a page with objects of size 2^N. If there are any pages with free objects, they will be at the @@ -457,7 +460,11 @@ static struct globals /* The overhead for each of the allocation orders. */ unsigned long long total_overhead_per_order[NUM_ORDERS]; } stats; -} G; +}; + +DEFINE_STATIC_STATE(globals, G) + +#include "ggc-page.c.state.h" /* The size in bytes required to maintain a bitmap for the objects on a page-entry. */ diff --git a/gcc/ggc-page.c.state.h b/gcc/ggc-page.c.state.h new file mode 100644 index 0000000..b359421 --- /dev/null +++ b/gcc/ggc-page.c.state.h @@ -0,0 +1,25 @@ +/* This file is autogenerated (by genstate.py); do not edit! */ +#if !GLOBAL_STATE +struct state_ggc_page_c_def +{ + unsigned x_objects_per_page_table[NUM_ORDERS]; + size_t x_object_size_table[NUM_ORDERS]; + inverse_table_def x_inverse_table[NUM_ORDERS]; + globals x_G; +}; + +size_t get_sizeof_state_ggc_page_c_def () +{ + return sizeof (state_ggc_page_c_def); +} + +#define objects_per_page_table \ + universe.ggc_page_c->x_objects_per_page_table +#define object_size_table \ + universe.ggc_page_c->x_object_size_table +#define inverse_table \ + universe.ggc_page_c->x_inverse_table +#define G \ + universe.ggc_page_c->x_G + +#endif /* !GLOBAL_STATE */ diff --git a/gcc/global-state.h b/gcc/global-state.h new file mode 100644 index 0000000..d317897 --- /dev/null +++ b/gcc/global-state.h @@ -0,0 +1,19 @@ +/* Crude testing hack for switching between: + global state + vs + per-universe state + This would be a config option controlling the whole build, so that + you'd use the former for a standalone build of gcc, and the latter + when building the code for use as a dynamic library. */ +#define GLOBAL_STATE 0 + +#if GLOBAL_STATE +#define DEFINE_STATIC_STATE(TYPE, NAME) static TYPE NAME; +#define DEFINE_STATIC_STATE_ARRAY(TYPE, NAME, NELT) static TYPE NAME[NELT]; +#else +#define DEFINE_STATIC_STATE(TYPE, NAME) +#define DEFINE_STATIC_STATE_ARRAY(TYPE, NAME, NELT) + +extern struct universe_def universe; +#endif + diff --git a/gcc/tracer.c b/gcc/tracer.c index 975cadb..c346bf1 100644 --- a/gcc/tracer.c +++ b/gcc/tracer.c @@ -57,13 +57,18 @@ static edge find_best_successor (basic_block); static edge find_best_predecessor (basic_block); static int find_trace (basic_block, basic_block *); +#include "global-state.h" +#include "universe.h" + /* Minimal outgoing edge probability considered for superblock formation. */ -static int probability_cutoff; -static int branch_ratio_cutoff; +DEFINE_STATIC_STATE(int, probability_cutoff) +DEFINE_STATIC_STATE(int, branch_ratio_cutoff) /* A bit BB->index is set if BB has already been seen, i.e. it is connected to some trace already. */ -sbitmap bb_seen; +DEFINE_STATIC_STATE(sbitmap, bb_seen) + +#include "tracer.c.state.h" static inline void mark_bb_seen (basic_block bb) diff --git a/gcc/tracer.c.state.h b/gcc/tracer.c.state.h new file mode 100644 index 0000000..95ccc11 --- /dev/null +++ b/gcc/tracer.c.state.h @@ -0,0 +1,22 @@ +/* This file is autogenerated (by genstate.py); do not edit! */ +#if !GLOBAL_STATE +struct state_tracer_c_def +{ + int x_probability_cutoff; + int x_branch_ratio_cutoff; + sbitmap x_bb_seen; +}; + +size_t get_sizeof_state_tracer_c_def () +{ + return sizeof (state_tracer_c_def); +} + +#define probability_cutoff \ + universe.tracer_c->x_probability_cutoff +#define branch_ratio_cutoff \ + universe.tracer_c->x_branch_ratio_cutoff +#define bb_seen \ + universe.tracer_c->x_bb_seen + +#endif /* !GLOBAL_STATE */ diff --git a/gcc/universe.c b/gcc/universe.c new file mode 100644 index 0000000..7a43851 --- /dev/null +++ b/gcc/universe.c @@ -0,0 +1,7 @@ +#include "config.h" +#include "system.h" +#include "global-state.h" +#include "universe.h" + +struct universe_def universe; + diff --git a/gcc/universe.h b/gcc/universe.h new file mode 100644 index 0000000..969b415 --- /dev/null +++ b/gcc/universe.h @@ -0,0 +1,15 @@ +/* This file is autogenerated (by genstate.py); do not edit! */ +#define GLOBAL_STATE 0 +#if !GLOBAL_STATE +struct universe_def +{ + struct state_tracer_c_def *tracer_c; + struct state_ggc_page_c_def *ggc_page_c; +}; + +extern struct universe_def universe; + +/* Functions for allocating state: */ +extern size_t get_sizeof_state_tracer_c_def (); +extern size_t get_sizeof_state_ggc_page_c_def (); +#endif /* !GLOBAL_STATE */