Den ons 18 dec. 2024 kl 16:43 skrev Guido Jäkel <g.jae...@dnb.de>: > > > Did I understand you correctly that you were seeing the issue before the > upgrade as well, but more frequent now? > Yes, we saw it also before during the last 12m using the "previous" > version (lxc container image) of our subversion service. But this was to > seldom to really care about. Now, with the "current" version -- refreshed > this days -- it happens within hours. To build software here, there's also > an about one year more current build chain or system libs like the glibc. > This could cause a mutex issue to become more problematic because of > different timings. > > We never observe the issue in our test or approval stage, but there's no > "real world load" on the subversion service. > > The container is also executed together with a whole bunch on one of our > blade server with 112 cores, but the Subversion container is limited to two > cores. > > > Reading up on it, I see that this is exactly the same failure you got. > I'm leaning towards an APR issue (possibly related to the exact > kernel/libpthread you are using), I never got around to float it on the APR > list since it is still quite a bit of "Subversion code" involved. My next > step would be to try to write a minimum reproduction application with a > bunch of threads all randomy trying to get and release the same mutex. > > If you provide me the source code I'm probably able to compile an run it > here, too. >
I have attached my test program which tries to exercise the mutex libraries. The program contains a simple linked list with 100 threads adding 100 items to the end of the list. In the end, the list should contain 100 items for each thread with a counter incrementing by 1 (per thread), but with threads intermixed randomly. There is of course a race condition updating the pointer to the last node of the linked list where two threads mustn't update it simultaneously, so it is protected by a mutex. The Makefile must be adjusted to point to the correct locations of the APR library. After compiling, the program can be executed with no argument (which defaults to using a mutex) or with a single "n" to run without mutex. Sample run: [[[ $ ./main Creating mutexes and threads... Done Waiting for threads to finish... Finished Checking for errors... No errors! $ ./main n Creating mutexes and threads... Done Waiting for threads to finish... Finished Checking for errors... Thread 32 missing 6 found 7 Thread 48 missing 2 found 3 Thread 30 missing 14 found 15 ]]] To me, this proves that my basic program is correct and that with a mutex, the shared memory is protected and without the mutex we get errors. I have tested this under both Ubuntu and Guix with the same results. The Guix environment is the same environment where I have been able to reproduce the assert() in Subversion which started the discussion. So it seems that the mutex code is not the culprit. But I still don't understand why Subversion, running on Ubuntu with threading enabled can pass the test suite while the same code running under Guix is failing. I would expect that my test program also succeeds under Gentoo, but it would still be interesting to see. ... I realise we accidentally had some discussion offlist, so for the benefit of the list I will repeat one question I've already asked: - Is it possible to run the Subversion test suite (make check -jN PARALLEL=N, where N is the number of parallell threads during testing) and does it randomly fail? If you check the tests.log file after a failure, do you see the same assertion as in svnserve? Cheers, Daniel
#include <apr_pools.h> #include <apr_portable.h> #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> /* Struct for a linked list, store the thread identifier and loop count */ typedef struct ll { int thread; int count; struct ll *next; } ll_t; /* An APR mutex */ apr_thread_mutex_t *ll_mutex = NULL; /* How many threads to run */ #define THREADS 100 /* Array of threads */ apr_thread_t *threads[THREADS]; /* How many times the loop will run within each thread */ #define COUNTS 100 /* The current HEAD of the linked list. Each thread will try to update this */ ll_t *head = 0; /* If we should use mutexes to protect the update of *head */ int useMutex = -1; /* Macro taking care of APR error handling for functions returning apr_status_t */ #define err(statcode) if (statcode != 0) { \ char errmsg[80]; \ apr_strerror(statcode, errmsg, sizeof(errmsg)); \ printf("Error %d on line %d: %s\n", statcode, __LINE__, errmsg); \ exit(statcode); \ } /* The actual work performed by each thread */ void *threadfunc(apr_thread_t *thread, void *data) { /* New item to be inserted into the linked list */ ll_t *curr; /* The thread id is the loop counter when we create the thread. Store the value in a local variable since it might change */ int threadid = *(int*)data; for (int i=0; i<COUNTS; i++) { /* Create a new linked list item and populate with data. We set ->count to the loop counter, meaning there should be one item for each number in [0 .. COUNTS-1]! */ curr = malloc(sizeof(ll_t)); curr->next = 0; curr->thread = threadid; curr->count = i; /* Append the item to the end of the linked list. Aquire the mutext unless command line arguments told us to NOT use the mutex. */ if (useMutex) err(apr_thread_mutex_lock(ll_mutex)); head->next = curr; head = curr; /* Sleep some to increase the chance that some other thread tries to update head */ usleep(10); apr_thread_yield(); /* Finally release the mutex and let some other threads work */ if (useMutex) err(apr_thread_mutex_unlock(ll_mutex)); usleep(10); apr_thread_yield(); } /* We're done - exit the thread and return */ apr_thread_exit(thread, APR_SUCCESS); return NULL; } int main(int argc, char *argv[]) { /* Pointer to the start of the linked list, used only in the main thread */ ll_t *start = 0; /* This is an array of the expected value of the next .count for the given thread */ int next[THREADS]; /* The command line argument n will make us not use a mutex */ if (argc > 1 && strcmp(argv[1], "n") == 0) useMutex = 0; /* Initialize APR and create a memory pool */ err(apr_initialize()); atexit(apr_terminate); apr_pool_t *pool; err(apr_pool_create(&pool, NULL)); /* Allocate some memory for the initial linked list entry */ start = malloc(sizeof(ll_t)); start->thread = -1; start->count = -1; head = start; /* Create the threads, having threadfunc do the work */ apr_threadattr_t *threadattr; err(apr_threadattr_create(&threadattr, pool)); printf("Creating mutex and threads..."); err(apr_thread_mutex_create(&ll_mutex, APR_THREAD_MUTEX_DEFAULT, pool)); for (int i = 0; i < THREADS; ++i) { /* We borrow the next array to pass the thread number to threadfunc *data to make sure each thread has its own memory for thread id */ next[i] = i; apr_thread_create(&threads[i], threadattr, threadfunc, &next[i], pool); } printf(" Done\n"); /* Wait for threads to finish */ printf("Waiting for threads to finish..."); for (int i = 0; i < THREADS; ++i) { apr_status_t retval; apr_thread_join(&retval, threads[i]); /* Reset the next array to only zero - this is the expected .count for the first item */ next[i] = 0; } printf(" Finished\n"); /* Walk through the linked list We expect a list of items with random mixed ->thread number, but for each ->thread, the ->count should increment by 1 each time that thead numer appears. This could fail if two threads tried to update head at the same time, in which case some ->count would be missing */ printf("Checking for errors..."); int errors = 0; head = start->next; while (head) { /* Uncomment the following line to print the content of the linked list in memory order */ /* printf("Current %p Thread %d Count %d Next %p\n", head, head->thread, head->count, head->next); */ assert(head->thread >= 0 && head->thread < THREADS); if (head->count != next[head->thread]) { printf("\nThread %d missing %d found %d", head->thread, next[head->thread], head->count); ++errors; } /* We saw ->count for ->thread, note it down in the array until we see the same thread again */ next[head->thread] = head->count+1; head = head->next; } /* Clean up and report result */ apr_pool_destroy(pool); if (errors == 0) { printf(" No errors!\n"); } else { printf("\n"); } return errors; }
Makefile
Description: Binary data