> > We now produce: > > movq b(%rip), %rsi > > movq a(%rip), %rcx > > movq (%rsi), %rax <- first 8 bytes are moved > > leaq 8(%rcx), %rdi > > andq $-8, %rdi <- dest is aligned > > movq %rax, (%rcx) > > movq 132(%rsi), %rax <- last 8 bytes are moved > > movq %rax, 132(%rcx) > > subq %rdi, %rcx <- alignment is subtracted from count > > > subq %rcx, %rsi <- source is aligned > > This (source aligned) is not always true, but nevertheless the > sequence is very tight.
Yep, sure, it is algigned only if source-dest is aligned, but that is best we can ask for. > > Unforutnately the following testcase: > > char *p,*q; > > t(int a) > > { > > if (a<100) > > memcpy(q,p,a); > > > > } > > Won't get inlined. This is because A is known to be smaller than 100 that > > results in anti range after conversion to size_t. This anti range allows > > very > > large values (above INT_MAX) and thus we do not know the block size. > > I am not sure if the sane range can be recovered somehow. If not, maybe > > this is common enough to add support for "probable" upper bound parameter to > > the template. > > Do we know if there is real code that intentionally does that other > than security flaws as result of improperly done range check? I do not think so. > > I think by default GCC should assume the memcpy size range is (0, 100) > here with perhaps an option to override it. Indeed, this is what I was suggesting. Problem is what to pass down to the expanders as a value range. We either need to update documentation of the expanders that the ranges are just highly probably - and I do not want to do that since I want to use the ranges for move_by_pieces, too. So I think we will have to introduce two upper bounds parameters - one sure and other very likely if there is no other solution. We play similar tricks in niter code. Honza