Yes, I still have the gdb session running for the crash.

Five threads are in the function "add_suffix".  Code lines are as follows:
1.  node_ptr = &nodes[*sibling_node_num_ptr];
2.  while (node_ptr->child_node_num != 0) {
3.  new_node_ptr->sibling_node_num[1] = 0;
4.  return;
5.  if (*(node_symbol_ptr + 1) == *(in_symbol_ptr + 1)) {

One thread is in rank_scores_thread and giving the SIGTRAP in the memmove 
function

Mainline is in score_base_node_tree_cap at this line:
    node_instances = node_ptr->instances;

Threads 1 - 5 do not have calls to memmove, memcopy or memset in the C code, 
although I'd need to check the assembly code to be sure these are not called.  
Mainline does have some mem library calls but these are only used at points in 
the code where all other threads have exited.  So I don't immediately see 
anything that looks particularly suspect.

For now I'm going to investigate this information from Google AI since the 
errors are occuring on a Haswell architecture i7-4790K:

Intel Haswell (and related architectures) processors may experience stability 
issues, including machine check errors (MCEs), due to a microcode bug related 
to REP MOVS (specifically REP MOVSB or REP MOVSQ) handling. These issues often 
cause system crashes or lockups, leading to microcode, BIOS/UEFI updates to 
resolve them.
Issue: A high-rate of interrupts or specific memory operations can cause REP 
MOVS instructions to trigger Machine Check Errors (MCE) or internal errors 
(IERR) on older processors.
Affected Processors: The bug primarily impacts older Intel processors, 
including Haswell and Broadwell architectures.
Fix/Mitigation: The primary solution is to apply the latest motherboard 
BIOS/UEFI update, which contains the corrected microcode update (often labelled 
20180108 or later).

Best Regards,

Kennon

> On 02/26/2026 1:42 PM PST Dimitry Andric <[email protected]> 
> wrote:
> 
>  
> If such a crash occurs, can you do a "thread apply all bt" in gdb? This will 
> show what all the other threads are doing. I'm betting some other thread is 
> calling memcpy or some other function that is messing with the direction flag.
> 
> -Dimitry
> 
> > On 26 Feb 2026, at 21:47, KENNON J CONRAD <[email protected]> wrote:
> > 
> > Yes, lots.  7 threads were running at the point of the crash  87% load on 
> > my i7-4790k.  I did a little research since the last post.  The memmove 
> > code where the crash occurs is:
> > 
> >   0x00007ff96ba812a8 <+136>: std
> > => 0x00007ff96ba812a9 <+137>: rep movsq %ds:(%rsi),%es:(%rdi)
> >   0x00007ff96ba812ac <+140>: cld
> > 
> > This sets the direction flag immediately before the rep movsq and clears 
> > the direction flag immediately after the rep movsq.  Yet when gdb breaks it 
> > shows the direction flag is not set:
> > 
> > eflags         0x246               [ PF ZF IF ]
> > 
> >  Would a forward move on overlapping data cause the SIGTRAP?  Could the 
> > code have moved to a different core?  Or could it have been interrupted by 
> > some other task that corrupts the flag?  As I mentioned earlier, the rep 
> > movsq is only failing once per several million times memmove is called so 
> > it seems likely to be something along those lines.
> > 
> > -Kennon
> > 
> > 
> >> On 02/26/2026 12:20 PM PST Dimitry Andric <[email protected]> 
> >> wrote:
> >> 
> >> 
> >> Is there some concurrency going on? Maybe some other part of the program 
> >> is flipping the direction flag?
> >> 
> >> -Dimitry
> >>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to