Hi Aaron, On Thu, Sep 22, 2016 at 03:10:24PM -0500, Aaron Sawdey wrote: > The powerpc target had a movmemsi pattern which supports memcpy() but > did not have anything for memcmp(). This adds support for builtin > expansion of memcmp() into inline code for modest constant lengths. > Performance on power8 is in the range of 3-7x faster than calling > memcmp() for lengths under 40 bytes. > > Bootstrap on powerpc64le, regtest in progress, OK for trunk if no new > regressions?
[ You are testing on BE and 32-bit as well, thanks. ] Just a bit more formatting, and one nit: > 2016-09-22 Aaron Sawdey <acsaw...@linux.vnet.ibm.com> > PR target/77685 <<<--- please add this here > * config/rs6000/rs6000.md (cmpmemsi): New define_expand. > * config/rs6000/rs6000.c (expand_block_compare): New function used by > cmpmemsi pattern to do builtin expansion of memcmp(). Space before (, here and below. > (compute_current_alignment): Add helper function for > expand_block_compare used to compute alignment as the compare proceeds. > (select_block_compare_mode): Used by expand_block_compare to select > the mode used for reading the next chunk of bytes in the compare. > (do_load_for_compare): Used by expand_block_compare to emit the load > insns for the compare. > (rs6000_emit_dot_insn): Moved this function to avoid a forward > reference from expand_block_compare(). > * config/rs6000/rs6000-protos.h (expand_block_compare): Add a > prototype for this function. > * config/rs6000/rs6000.opt (mblock-compare-inline-limit): Add a new > target option for controlling how much code inline expansion of > memcmp() will be allowed to generate. You can just say "New function." and the like for most things here, fwiw. > +/* Figure out the correct instructions to generate to load data for > + block compare. MODE is used for the read from memory, and > + data is zero extended if REG is wider than MODE. If LE code > + is being generated, bswap loads are used. "Emit a load for a block compare", maybe? I.e. say it emits code, and it does only one load. > + /* final fallback is do one byte */ Capital, dot space space. > + /* If this is not a fixed size compare, just call memcmp */ Dot space space. > + /* This must be a fixed size alignment */ Dot space space. > + /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff */ Full sentence. > + /* Anything to move? */ Question mark space space. > + int offset = 0; > + machine_mode load_mode = > + select_block_compare_mode (offset, bytes, base_align, word_mode_ok); > + int load_mode_size = GET_MODE_SIZE (load_mode); HOST_WIDE_INT instead of the ints? It isn't slower and I find it hard to convince myself int always works. > + int extra_bytes = load_mode_size - bytes; Here too. > + int remain = bytes - cmp_bytes; One more. Very nice improvement, thank you! This is okay for trunk. Segher