On 24 Feb 2026, at 19:38, KENNON J CONRAD via Cygwin <[email protected]> wrote:
> 
>   I am having a problem with that is apparently related to memmove and 
> looking for some advice on how to investigate further.  This winter I have 
> been working to simplify GLZA source code and make it more readable.  GLZA is 
> an advanced open source code straight line grammar compressor first released 
> in 2015.  Among these changes was replacing some rather bloated code with 
> memmove and memset in various locations.  The program started crashing 
> occassionally and after extensively reviewing the changes, I was unable to 
> find a cause for these crashes.  So I installed gdb to try to find out what 
> was going on and was apparently able to find the cause of the problem.  As a 
> new gdb user, I am not very comfortable with trusting the results of what gdb 
> showing, but it is pointing directly to one of the code changes I made.  I 
> backed out of this code change and the program has not crashed after 3 days 
> of nearly continuous testing.
> 
>   So here is what gdb reports when backtrace is run immediately after 
> reporting a "SIGTRAP":
> 
> (gdb) bt full
> #0 0x00007ff9dd8aa98b in KERNELBASE!DebugBreak () from 
> /cygdrive/c/Windows/system32/KERNELBASE.dll
> No symbol table info available.
> #1 0x00007ff9ca3b6417 in cygwin1!.assert () from 
> /cygdrive/c/Windows/cygwin1.dll
> No symbol table info available.
> #2 0x00007ff9ca3cfb18 in secure_getenv () from /cygdrive/c/Windows/cygwin1.dll
> No symbol table info available.
> #3 0x00007ff9e03dd82d in ntdll!.chkstk () from 
> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> No symbol table info available.
> #4 0x00007ff9e038916b in ntdll!RtlRaiseException () from 
> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> No symbol table info available.
> #5 0x00007ff9e03dc9ee in ntdll!KiUserExceptionDispatcher () from 
> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> No symbol table info available.
> #6 0x00007ff9ca3b12a9 in memmove () from /cygdrive/c/Windows/cygwin1.dll
> No symbol table info available.
> #7 0x0000000100409a7c in rank_scores_thread (arg=0x6ffece890010) at 
> GLZAcompress.c:904
> new_score_rank = 2633
> new_score_lmi2 = 183964750
> new_score_pmi2 = 183964725
> rank = 4380
> max_rank = 2633
> num_symbols = 25
> new_score_lmi = 92079851
> new_score_pmi = 92079826
> thread_data_ptr = 0x6ffece890010
> max_scores = 4883
> candidates_index = 0xa00034470
> score_index = 4380
> node_score_num_symbols = 7
> num_candidates = 4381
> node_ptrs_num = 12224
> local_write_index = 12225
> rank_scores_buffer = 0x6ffece890020
> candidates = 0x6ffece990020
> score = 47.6283531
> #8 0x00007ff9ca412eec in cygwin1!.getreent () from 
> /cygdrive/c/Windows/cygwin1.dll
> No symbol table info available.
> #9 0x00007ff9ca3b47d3 in cygwin1!.assert () from 
> /cygdrive/c/Windows/cygwin1.dll
> No symbol table info available.
> #10 0x0000000000000000 in ?? ()
> No symbol table info available.
> 
> GLZAcompress.c line 904 is as follows and is in code that runs as a separate 
> thread created in main:
> memmove(&candidates_index[new_score_rank+1], 
> &candidates_index[new_score_rank], 2 * (rank - new_score_rank));
> This does point directly to where a code change was made.
> 
> candidates_index is allocated in main and not ever intentionally changed 
> until deallocated at the end of program execution:
> if (0 == (candidates_index = (uint16_t *)malloc(max_scores * 
> sizeof(uint16_t))))
>  fprintf(stderr, "ERROR - memory allocation failed\n");
> This value is passed to the thread in a structure pointed to by the thread 
> arg.  The value 0xa00034470 for candidates_index is similar to what is 
> reported on subsequent runs with added code to print this value so I don't 
> think it's corrupted, but would need to duplicate the crash after checking 
> the initial value to be 100% certain.  With gdb reporting that rank = 4380 
> and new_score_rank = 2633 at the time of the SIGTRAP, this should be a 
> backward move of 1747 uint16_t values by 2 bytes with a 2 byte difference 
> between the source and destination addresses.
> 
> Prior to this code change and for the last 3 days I have been using this code 
> instead and not seen any crashes:
> uint16_t * score_ptr = &candidates_index[new_score_rank];
> uint16_t * candidate_ptr = &candidates_index[rank];
> while (candidate_ptr >= score_ptr + 8) {
> *candidate_ptr = *(candidate_ptr - 1);
> *(candidate_ptr - 1) = *(candidate_ptr - 2);
> *(candidate_ptr - 2) = *(candidate_ptr - 3);
> *(candidate_ptr - 3) = *(candidate_ptr - 4);
> *(candidate_ptr - 4) = *(candidate_ptr - 5);
> *(candidate_ptr - 5) = *(candidate_ptr - 6);
> *(candidate_ptr - 6) = *(candidate_ptr - 7);
> *(candidate_ptr - 7) = *(candidate_ptr - 8);
> candidate_ptr -= 8;
> }
> while (candidate_ptr > score_ptr) {
> *candidate_ptr = *(candidate_ptr - 1);
> candidate_ptr--;
> }
> Yes, it's bloated code that should do the same thing as the memmove, but most 
> importantly the code has never caused any problems.  Interestingly, even this 
> code shows memmove in the assembly code (gcc -S), but only for the second 
> while loop.  The looping code for the first while loop looks like this and 
> moves 8 uint16_t's in just 5 instruction so it is perhaps not as inefficient 
> as the source code looks:
> .L25:
> movdqu -16(%rax), %xmm1
> subq $16, %rax
> movups %xmm1, 2(%rax)
> cmpq %rdx, %rax
> jnb .L25
> 
> It may or may not matter, but the code this is happening on is very CPU 
> intensive - there can be up to 8 threads running at the same time when this 
> problem occurs.  The problem doesn't occur consistently, it seems to be 
> rather random.  The program runs about 500 iterations of ranking up to the 
> top 30,000 new grammar rule candidates over nearly 4 hours on my test case 
> and has crashed on different iterations each time it has crashed, even though 
> the thread that seems to be crashing should be seeing exactly the same data 
> each time the program is run.  The malloc'ed array address could be changing, 
> I haven't checked that out.
> 
> I find it really hard to believe there is a bug in memmove but that seems to 
> be what gdb and my testing are indicating.  So I am looking for advice on how 
> to better understand what is causing the program to crash.  I would like to 
> review the code memset is using, but have not been able to figure out how to 
> track that down.  Any help in understanding what code the complier is using 
> for memmove would be helpful.  Are there other things I could possibly be 
> overlooking?  Are the any other things I should review or report that would 
> be helpful?  I could try to write a simplified test case if that would be 
> useful.

Is there any way you can use AddressSanitizer or UndefinedBehaviorSanitizer to 
double-check that you're not doing anything undefined? In this type of code it 
is very easy to miss small off-by-one errors and such.

As far as I know, Cygwin's gcc does not have AddressSanitizer, but if you can 
compile the same code with Visual Studio, you can use it's AddressSanitizer.

-Dimitry


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to