On Thu, Aug 06, 2015 at 08:36:36PM +0100, Richard Sandiford wrote:
> David Malcolm <dmalc...@redhat.com> writes:
> > On Wed, 2015-08-05 at 16:22 -0400, Trevor Saunders wrote:
> >> On Wed, Aug 05, 2015 at 11:34:28AM -0400, David Malcolm wrote:
> >> > On Wed, 2015-08-05 at 11:28 -0400, David Malcolm wrote:
> >> > > On Wed, 2015-08-05 at 13:47 +0200, Richard Biener wrote:
> >> > > > On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders
> >> > > > <tbsau...@tbsaunde.org> wrote:
> >> > > > > On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
> >> > > > >> On Sat, Jul 25, 2015 at 4:37 AM,  <tbsaunde+...@tbsaunde.org> 
> >> > > > >> wrote:
> >> > > > >> > From: Trevor Saunders <tbsaunde+...@tbsaunde.org>
> >> > > > >> >
> >> > > > >> >         * config/arc/arc.h, config/bfin/bfin.h, 
> >> > > > >> > config/frv/frv.h,
> >> > > > >> >         config/ia64/ia64-protos.h, config/ia64/ia64.c,
> >> > > > >> > config/ia64/ia64.h,
> >> > > > >> >         config/lm32/lm32.h, config/mep/mep.h, 
> >> > > > >> > config/mmix/mmix.h,
> >> > > > >> >         config/rs6000/rs6000.c, config/rs6000/xcoff.h,
> >> > > > >> > config/spu/spu.h,
> >> > > > >> >         config/visium/visium.h, defaults.h: Define
> >> > > > >> > ASM_OUTPUT_LABEL to
> >> > > > >> > the name of a function.
> >> > > > >> >         * output.h (default_output_label): New prototype.
> >> > > > >> >         * varasm.c (default_output_label): New function.
> >> > > > >> >         * vmsdbgout.c: Include tm_p.h.
> >> > > > >> >         * xcoffout.c: Likewise.
> >> > > > >>
> >> > > > >> Just a general remark - the GCC output machinery is known to be 
> >> > > > >> slow,
> >> > > > >> adding indirect calls might be not the very best idea without
> >> > > > >> refactoring
> >> > > > >> some of it.
> >> > > > >>
> >> > > > >> Did you do any performance measurements for artificial testcases
> >> > > > >> exercising the specific bits you change?
> >> > > > >
> >> > > > > sorry about the delay, but I finally got a chance to do some
> >> > > > > perf tests
> >> > > > > of the first patch.  I took three test cases fold-const.ii,
> >> > > > > insn-emit.ii
> >> > > > > and a random .i from firefox and did 3 trials of the length of 100
> >> > > > > compilations.  The only non default flag was -std=gnu++11.
> >> > > > >
> > [...snip results...]
> >> > > > >
> >> > > > > So, roughly that looks to me like a range from improving by .5% to
> >> > > > > regressing by 1%.  I'm not sure what could cause an improvement, 
> >> > > > > so I
> >> > > > > kind of wonder how valid these results are.
> >> > > > 
> >> > > > Hmm, indeed.  The speedup looks suspicious.
> >> > > > 
> >> > > > > Another question is how one can refactor the output machinary to be
> >> > > > > faster.  My first  thought is to buffer text internally before 
> >> > > > > calling
> >> > > > > stdio functions, but that seems like a giant job.
> >> > > > 
> >> > > > stdio functions are already buffering, so I don't know either.
> >> > > > 
> >> > > > But yes, going the libas route would improve things here, or for
> >> > > > example enhancing gas to be able to eat target binary data
> >> > > > without the need to encode it in printable characters...
> >> > > > 
> >> > > > .raw_data number-of-bytes
> >> > > > <raw data>
> >> > > > 
> >> > > > Makes it quite unparsable to editors of course ...
> >> > > 
> >> > > A middle-ground might be to do both:
> >> > > 
> >> > > .raw_data number-of-bytes
> >> > > <raw data>
> >> > 
> >> > Sorry, I hit "Send" too early; I meant something like this as a
> >> > middle-ground:
> >> > 
> >> >   .raw_data number-of-bytes
> >> >   <raw data>
> >> > 
> >> >   ; comment giving the formatted text
> >> > 
> >> > so that cc1 etc are doing the formatting work to make the comment, so
> >> > that human readers can see what the raw data is meant to be, but the
> >> > assembler doesn't have to do work to parse it.
> >> 
> >> well, having random bytes in the file might still screw up editors, and
> >> I'd kind of expect that to be slower over all since gcc still does the
> >> formating, and both gcc and as do more IO.
> >> 
> >> > FWIW, I once had a go at hiding asm_out_file behind a class interface,
> >> > trying to build up higher-level methods on top of raw text printing.
> >> > Maybe that's a viable migration strategy  (I didn't finish that patch).
> >> 
> >> I was thinking about trying that, but I couldn't think of a good way to
> >> do it incrementally.
> >> 
> >> Trev
> >
> > Attached is a patch from some experimentation, very much a
> > work-in-progress.
> >
> > It eliminates the macro ASM_OUTPUT_LABEL in favor of calls to a method
> > of an "output" object:
> >
> >   g_output.output_label (lab);
> >
> > g_output would be a thin wrapper around asm_out_file (with the
> > assumption that asm_out_file never changes to point at anything else).
> >
> > One idea here is to gradually replace uses of asm_out_file with methods
> > of g_output, giving us a possible approach for tackling the "don't
> > format so much and then parse it again" optimization.
> >
> > Another idea here is to use templates and specialization in place of
> > target macros, to capture things in the type system;
> > g_output is actually:
> >
> >   output<target_t> g_output;
> >
> > which has a default implementation of output_label corresponding to the
> > current default ASM_OUTPUT_LABEL:
> >
> > template <typename Target>
> > inline void
> > output<Target>::output_label (const char *name)
> > {
> >   assemble_name (name);
> >   puts (":\n");  
> > }
> >
> > ...but a specific Target traits class could have a specialization e.g.
> >
> > template <>
> > inline void
> > output<target_arm>::output_label (const char *name)
> > {
> >   arm_asm_output_labelref (name);
> > }
> >
> > This could give us (I hope) equivalent performance to the current
> > macro-based approach, but without using the preprocessor, albeit adding
> > some C++ (the non-trivial use of templates gives me pause).
> 
> I might be missing the point, sorry, but it sounds like this enshrines
> the idea of having a single target.

I assume you are refering to the template part?  Not totally, see
https://blog.mozilla.org/nfroyd/2014/10/30/porting-rr-to-x86-64/
for an example of building a tool that uses templates and supports
multiple targets at the same time.  That said I'm not sure I see the
advantages, and the switch statements look rather like virtual
functions.

> An integrated assembler or tighter asm output would be nice, but when
> I last checked LLVM was usually faster than GCC even when compiling to asm,
> even though LLVM does use indirection (in the form of virtual functions)
> for its output routines.  I don't think indirect function calls themselves
> are the problem -- as long as we get the abstraction right :-)

yeah, last time I looked (tbf a while ago) the C++ front end took up by
far the largest part of the time.  So it may not be terribly important,
but it would still be nice to figure out what a good design looks like.

Trev

> 
> Thanks,
> Richard

Reply via email to