On Mon, Jun 08, 2015 at 01:55:47PM -0700, Richard Henderson wrote: > On 06/04/2015 12:35 PM, Ondřej Bílka wrote: > >char *strchr_c(char *x, unsigned long u); > >#define strchr(x,c) \ > >(__builtin_constant_p(c) ? strchr_c (x, c * (~0ULL / 255)) : strchr (x,c)) > > > > Certainly not a universal win, especially for 64-bit RISC. This > constant can be just as expensive to construct as the original > multiplication. > > Consider PPC64, where 4 insns are required to form this kind of > replicated 64-bit constant, and 3 insns are required to replicate C. > > Then there's other RISC for which replicating C is easily done in > parallel with the initial alignment checks. > Thats another problem that these transformations depend on platform so you need to maintain somewhere table what is profitable and what is not.
As these functions go its better than you write as users frequently call strchr in loop, there is potential of savings, like 75% of strchr calls happened within 128 cycles of previous one which is evidence of that use case. Second saving would be in header checks. Unless you need to write then a best way looks to initially check s % 4096 < 4096 - 32 to avoid page fault. There could be entry point if gcc could prove that there are 32 more bytes allocated after s to other entry point. I have todo project to add a interface which tranform while(s=strchr(s+1,'c')) into something like struct *strchrp = strchr_init (s,'c'); while (s = strchr_next (strchrp)) to avoid overhead of repeated calls, strchr_next inline will first check mask with values in say 16 current bytes and if it insn't there it will do libcall.