Perhaps, but it will take some time to install Visual Studio and figure out how 
to run it.

Best Regards,

Kennon

> On 02/25/2026 2:09 AM PST Dimitry Andric <[email protected]> 
> wrote:
> 
>  
> On 24 Feb 2026, at 19:38, KENNON J CONRAD via Cygwin <[email protected]> 
> wrote:
> > 
> >   I am having a problem with that is apparently related to memmove and 
> > looking for some advice on how to investigate further.  This winter I have 
> > been working to simplify GLZA source code and make it more readable.  GLZA 
> > is an advanced open source code straight line grammar compressor first 
> > released in 2015.  Among these changes was replacing some rather bloated 
> > code with memmove and memset in various locations.  The program started 
> > crashing occassionally and after extensively reviewing the changes, I was 
> > unable to find a cause for these crashes.  So I installed gdb to try to 
> > find out what was going on and was apparently able to find the cause of the 
> > problem.  As a new gdb user, I am not very comfortable with trusting the 
> > results of what gdb showing, but it is pointing directly to one of the code 
> > changes I made.  I backed out of this code change and the program has not 
> > crashed after 3 days of nearly continuous testing.
> > 
> >   So here is what gdb reports when backtrace is run immediately after 
> > reporting a "SIGTRAP":
> > 
> > (gdb) bt full
> > #0 0x00007ff9dd8aa98b in KERNELBASE!DebugBreak () from 
> > /cygdrive/c/Windows/system32/KERNELBASE.dll
> > No symbol table info available.
> > #1 0x00007ff9ca3b6417 in cygwin1!.assert () from 
> > /cygdrive/c/Windows/cygwin1.dll
> > No symbol table info available.
> > #2 0x00007ff9ca3cfb18 in secure_getenv () from 
> > /cygdrive/c/Windows/cygwin1.dll
> > No symbol table info available.
> > #3 0x00007ff9e03dd82d in ntdll!.chkstk () from 
> > /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > No symbol table info available.
> > #4 0x00007ff9e038916b in ntdll!RtlRaiseException () from 
> > /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > No symbol table info available.
> > #5 0x00007ff9e03dc9ee in ntdll!KiUserExceptionDispatcher () from 
> > /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > No symbol table info available.
> > #6 0x00007ff9ca3b12a9 in memmove () from /cygdrive/c/Windows/cygwin1.dll
> > No symbol table info available.
> > #7 0x0000000100409a7c in rank_scores_thread (arg=0x6ffece890010) at 
> > GLZAcompress.c:904
> > new_score_rank = 2633
> > new_score_lmi2 = 183964750
> > new_score_pmi2 = 183964725
> > rank = 4380
> > max_rank = 2633
> > num_symbols = 25
> > new_score_lmi = 92079851
> > new_score_pmi = 92079826
> > thread_data_ptr = 0x6ffece890010
> > max_scores = 4883
> > candidates_index = 0xa00034470
> > score_index = 4380
> > node_score_num_symbols = 7
> > num_candidates = 4381
> > node_ptrs_num = 12224
> > local_write_index = 12225
> > rank_scores_buffer = 0x6ffece890020
> > candidates = 0x6ffece990020
> > score = 47.6283531
> > #8 0x00007ff9ca412eec in cygwin1!.getreent () from 
> > /cygdrive/c/Windows/cygwin1.dll
> > No symbol table info available.
> > #9 0x00007ff9ca3b47d3 in cygwin1!.assert () from 
> > /cygdrive/c/Windows/cygwin1.dll
> > No symbol table info available.
> > #10 0x0000000000000000 in ?? ()
> > No symbol table info available.
> > 
> > GLZAcompress.c line 904 is as follows and is in code that runs as a 
> > separate thread created in main:
> > memmove(&candidates_index[new_score_rank+1], 
> > &candidates_index[new_score_rank], 2 * (rank - new_score_rank));
> > This does point directly to where a code change was made.
> > 
> > candidates_index is allocated in main and not ever intentionally changed 
> > until deallocated at the end of program execution:
> > if (0 == (candidates_index = (uint16_t *)malloc(max_scores * 
> > sizeof(uint16_t))))
> >  fprintf(stderr, "ERROR - memory allocation failed\n");
> > This value is passed to the thread in a structure pointed to by the thread 
> > arg.  The value 0xa00034470 for candidates_index is similar to what is 
> > reported on subsequent runs with added code to print this value so I don't 
> > think it's corrupted, but would need to duplicate the crash after checking 
> > the initial value to be 100% certain.  With gdb reporting that rank = 4380 
> > and new_score_rank = 2633 at the time of the SIGTRAP, this should be a 
> > backward move of 1747 uint16_t values by 2 bytes with a 2 byte difference 
> > between the source and destination addresses.
> > 
> > Prior to this code change and for the last 3 days I have been using this 
> > code instead and not seen any crashes:
> > uint16_t * score_ptr = &candidates_index[new_score_rank];
> > uint16_t * candidate_ptr = &candidates_index[rank];
> > while (candidate_ptr >= score_ptr + 8) {
> > *candidate_ptr = *(candidate_ptr - 1);
> > *(candidate_ptr - 1) = *(candidate_ptr - 2);
> > *(candidate_ptr - 2) = *(candidate_ptr - 3);
> > *(candidate_ptr - 3) = *(candidate_ptr - 4);
> > *(candidate_ptr - 4) = *(candidate_ptr - 5);
> > *(candidate_ptr - 5) = *(candidate_ptr - 6);
> > *(candidate_ptr - 6) = *(candidate_ptr - 7);
> > *(candidate_ptr - 7) = *(candidate_ptr - 8);
> > candidate_ptr -= 8;
> > }
> > while (candidate_ptr > score_ptr) {
> > *candidate_ptr = *(candidate_ptr - 1);
> > candidate_ptr--;
> > }
> > Yes, it's bloated code that should do the same thing as the memmove, but 
> > most importantly the code has never caused any problems.  Interestingly, 
> > even this code shows memmove in the assembly code (gcc -S), but only for 
> > the second while loop.  The looping code for the first while loop looks 
> > like this and moves 8 uint16_t's in just 5 instruction so it is perhaps not 
> > as inefficient as the source code looks:
> > .L25:
> > movdqu -16(%rax), %xmm1
> > subq $16, %rax
> > movups %xmm1, 2(%rax)
> > cmpq %rdx, %rax
> > jnb .L25
> > 
> > It may or may not matter, but the code this is happening on is very CPU 
> > intensive - there can be up to 8 threads running at the same time when this 
> > problem occurs.  The problem doesn't occur consistently, it seems to be 
> > rather random.  The program runs about 500 iterations of ranking up to the 
> > top 30,000 new grammar rule candidates over nearly 4 hours on my test case 
> > and has crashed on different iterations each time it has crashed, even 
> > though the thread that seems to be crashing should be seeing exactly the 
> > same data each time the program is run.  The malloc'ed array address could 
> > be changing, I haven't checked that out.
> > 
> > I find it really hard to believe there is a bug in memmove but that seems 
> > to be what gdb and my testing are indicating.  So I am looking for advice 
> > on how to better understand what is causing the program to crash.  I would 
> > like to review the code memset is using, but have not been able to figure 
> > out how to track that down.  Any help in understanding what code the 
> > complier is using for memmove would be helpful.  Are there other things I 
> > could possibly be overlooking?  Are the any other things I should review or 
> > report that would be helpful?  I could try to write a simplified test case 
> > if that would be useful.
> 
> Is there any way you can use AddressSanitizer or UndefinedBehaviorSanitizer 
> to double-check that you're not doing anything undefined? In this type of 
> code it is very easy to miss small off-by-one errors and such.
> 
> As far as I know, Cygwin's gcc does not have AddressSanitizer, but if you can 
> compile the same code with Visual Studio, you can use it's AddressSanitizer.
> 
> -Dimitry

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to