Re: Memmove causing program crashes, giving SIGTRAP in GDB(?)

KENNON J CONRAD via Cygwin Fri, 27 Feb 2026 18:31:34 -0800

Hi Brian,

I just wanted to add that the stash and store idea you suggest that is also 
used in memmove has a very nice impact
on the assembly code.


With the old code that does this for the last 0 to 7 words:
        while (candidate_ptr > score_ptr) {
          *candidate_ptr = *(candidate_ptr - 1);
          candidate_ptr--;
        }

the assembly code shows this from the point where the move starts:
.L24:
        movdqu  -16(%rax), %xmm1
        subq    $16, %rax
        movups  %xmm1, 2(%rax)
        cmpq    %rdx, %rax
        jnb     .L24
        movq    %r10, %rax
        subq    %r9, %rax
        subq    $16, %rax
        notq    %rax
        andq    $-16, %rax
        addq    %r10, %rax
        cmpq    %rax, %r9
        jnb     .L28
        movq    %rax, %rcx
        movq    %rax, %rdx
        movq    %r9, 48(%rsp)
        subq    %r9, %rcx
        subq    $1, %rcx
        shrq    %rcx
        leaq    2(%rcx,%rcx), %r8
        negq    %rcx
        subq    %r8, %rdx
        leaq    (%rax,%rcx,2), %rcx
        call    memmove
        movq    48(%rsp), %r9
        jmp     .L28

But with stash and store:
        *(uint64_t *)&candidates_index[new_score_rank + 1] = first_four;
        *(uint64_t *)&candidates_index[new_score_rank + 5] = next_four;

the assembly code from the point where the move start is this:
.L24:
        movdqu  -16(%r9), %xmm1
        subq    $16, %r9
        movups  %xmm1, 2(%r9)
        cmpq    %rax, %r9
        jnb     .L24
        movups  %xmm0, 2(%rdi,%rdx)
        jmp     .L26

There are a couple of extra assembly instructions to stash into xmm0 before the 
move, but this is a big reduction in
assembly code size for the backward memory move.  Not as fast as memmove if the 
DF wasn't getting corrupted, but much
better than the old code plus it completely avoids the risk of DF corruption 
during rep movsq in memmove for backward
move sizes >= 8!  I like it because there is no need to worry about whether rep 
movsb or rep movsw could also be
vulnerable to DF corruption.

Best Regards,

Kennon

> On 02/27/2026 11:49 AM PST Brian Inglis via Cygwin <[email protected]> wrote:
> 
>  
> Hi Kennon,
> 
> Some perf reports and analysis imply that backward moves (with overlap?) are 
> no 
> faster than straight rep movsb on some CPUs, so it may be better to just 
> simplify to that, unless you want to stash the final element(s) to be moved 
> out 
> of the way in register(s), and use multiple registers in unrolled wide moves 
> for 
> the aligned portion?
>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Re: Memmove causing program crashes, giving SIGTRAP in GDB(?)

Reply via email to