Den ons 18 dec. 2024 kl 16:43 skrev Guido Jäkel <g.jae...@dnb.de>:

>
> > Did I understand you correctly that you were seeing the issue before the
> upgrade as well, but more frequent now?
> Yes, we saw it also before during the last 12m using the "previous"
> version (lxc container image) of our subversion service. But this was to
> seldom to really care about. Now, with the "current" version -- refreshed
> this days -- it happens within hours. To build software here, there's also
> an about one year more current build chain or system libs like the glibc.
> This could cause a mutex issue to become more problematic because of
> different timings.
>
> We never observe the issue in our test or approval stage, but there's no
> "real world load" on the subversion service.
>
> The container is also executed together with a whole bunch on one of our
> blade server with 112 cores, but the Subversion container is limited to two
> cores.
>
> > Reading up on it, I see that this is exactly the same failure you got.
> I'm leaning towards an APR issue (possibly related to the exact
> kernel/libpthread you are using), I never got around to float it on the APR
> list since it is still quite a bit of "Subversion code" involved. My next
> step would be to try to write a minimum reproduction application with a
> bunch of threads all randomy trying to get and release the same mutex.
>
> If you provide me the source code I'm probably able to compile an run it
> here, too.
>

I have attached my test program which tries to exercise the mutex
libraries. The program contains a simple linked list with 100 threads
adding 100 items to the end of the list. In the end, the list should
contain 100 items for each thread with a counter incrementing by 1 (per
thread), but with threads intermixed randomly. There is of course a race
condition updating the pointer to the last node of the linked list where
two threads mustn't update it simultaneously, so it is protected by a mutex.

The Makefile must be adjusted to point to the correct locations of the APR
library.

After compiling, the program can be executed with no argument (which
defaults to using a mutex) or with a single "n" to run without mutex.

Sample run:
[[[
$ ./main
Creating mutexes and threads... Done
Waiting for threads to finish... Finished
Checking for errors... No errors!
$ ./main n
Creating mutexes and threads... Done
Waiting for threads to finish... Finished
Checking for errors...
Thread 32 missing 6 found 7
Thread 48 missing 2 found 3
Thread 30 missing 14 found 15
]]]

To me, this proves that my basic program is correct and that with a mutex,
the shared memory is protected and without the mutex we get errors. I have
tested this under both Ubuntu and Guix with the same results.

The Guix environment is the same environment where I have been able to
reproduce the assert() in Subversion which started the discussion. So it
seems that the mutex code is not the culprit. But I still don't understand
why Subversion, running on Ubuntu with threading enabled can pass the test
suite while the same code running under Guix is failing.

I would expect that my test program also succeeds under Gentoo, but it
would still be interesting to see.

...

I realise we accidentally had some discussion offlist, so for the benefit
of the list I will repeat one question I've already asked:

- Is it possible to run the Subversion test suite (make check -jN
PARALLEL=N, where N is the number of parallell threads during testing) and
does it randomly fail? If you check the tests.log file after a failure, do
you see the same assertion as in svnserve?

Cheers,
Daniel
#include <apr_pools.h>
#include <apr_portable.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/* Struct for a linked list, store the thread identifier and loop count */
typedef struct ll
{
  int thread;
  int count;
  struct ll *next;
} ll_t;

/* An APR mutex */
apr_thread_mutex_t *ll_mutex = NULL;

/* How many threads to run */
#define THREADS 100

/* Array of threads */
apr_thread_t *threads[THREADS];

/* How many times the loop will run within each thread */
#define COUNTS 100

/* The current HEAD of the linked list. Each thread will try to update this */
ll_t *head = 0;

/* If we should use mutexes to protect the update of *head */
int useMutex = -1;
 
/* Macro taking care of APR error handling for functions returning apr_status_t */ 
#define err(statcode) if (statcode != 0) { \
  char errmsg[80]; \
  apr_strerror(statcode, errmsg, sizeof(errmsg)); \
  printf("Error %d on line %d: %s\n", statcode, __LINE__, errmsg);	\
  exit(statcode); \
}

/* The actual work performed by each thread */
void *threadfunc(apr_thread_t *thread, void *data)
{
  /* New item to be inserted into the linked list */
  ll_t *curr;
  
  /* The thread id is the loop counter when we create the thread. Store
     the value in a local variable since it might change */
  int threadid = *(int*)data;
  
  for (int i=0; i<COUNTS; i++) {
    /* Create a new linked list item and populate with data.
       We set ->count to the loop counter, meaning there should be one item
       for each number in [0 .. COUNTS-1]! */
    curr = malloc(sizeof(ll_t));
    curr->next = 0;
    curr->thread = threadid;
    curr->count = i;
    
    /* Append the item to the end of the linked list. Aquire the mutext unless
       command line arguments told us to NOT use the mutex. */
    if (useMutex)
      err(apr_thread_mutex_lock(ll_mutex));
    head->next = curr;
    head = curr;

    /* Sleep some to increase the chance that some other thread tries to update head */
    usleep(10);
    apr_thread_yield();
    
    /* Finally release the mutex and let some other threads work */
    if (useMutex)
      err(apr_thread_mutex_unlock(ll_mutex));
    usleep(10);
    apr_thread_yield();
  }

  /* We're done - exit the thread and return */
  apr_thread_exit(thread, APR_SUCCESS);
  return NULL;
}

int main(int argc, char *argv[])
{
  /* Pointer to the start of the linked list, used only in the main thread */
  ll_t *start = 0;
  /* This is an array of the expected value of the next .count for the given thread */
int next[THREADS];
  
  /* The command line argument n will make us not use a mutex */
  if (argc > 1 && strcmp(argv[1], "n") == 0)
    useMutex = 0;

  /* Initialize APR and create a memory pool */
  err(apr_initialize());
  atexit(apr_terminate);
  apr_pool_t *pool;
  err(apr_pool_create(&pool, NULL));

  /* Allocate some memory for the initial linked list entry */
  start = malloc(sizeof(ll_t));
  start->thread = -1;
  start->count = -1;
  head = start;

  /* Create the threads, having threadfunc do the work */
  apr_threadattr_t *threadattr;
  err(apr_threadattr_create(&threadattr, pool));
  printf("Creating mutex and threads..."); 
  err(apr_thread_mutex_create(&ll_mutex, APR_THREAD_MUTEX_DEFAULT, pool));
  for (int i = 0; i < THREADS; ++i) {
    /* We borrow the next array to pass the thread number to threadfunc *data
       to make sure each thread has its own memory for thread id */
    next[i] = i;
    apr_thread_create(&threads[i], threadattr, threadfunc, &next[i], pool);
  }
  printf(" Done\n");
 
  /* Wait for threads to finish */ 
  printf("Waiting for threads to finish...");
  for (int i = 0; i < THREADS; ++i) {
    apr_status_t retval;
    apr_thread_join(&retval, threads[i]);
    /* Reset the next array to only zero - this is the expected .count for the first item */
    next[i] = 0;
  }
  printf(" Finished\n");
  
  /* Walk through the linked list
     We expect a list of items with random mixed ->thread number, but for each
     ->thread, the ->count should increment by 1 each time that thead numer appears.
     This could fail if two threads tried to update head at the same time, in which case
     some ->count would be missing */
  printf("Checking for errors...");
  int errors = 0;
  head = start->next;
  while (head) {
    /* Uncomment the following line to print the content of the linked list in memory order */
    /* printf("Current %p Thread %d Count %d Next %p\n", head, head->thread, head->count, head->next); */
    assert(head->thread >= 0 && head->thread < THREADS);
    if (head->count != next[head->thread]) {
      printf("\nThread %d missing %d found %d", head->thread, next[head->thread], head->count);
      ++errors;
    }
    /* We saw ->count for ->thread, note it down in the array until we see the same thread again */
    next[head->thread] = head->count+1;
    head = head->next;
  }

  /* Clean up and report result */
  apr_pool_destroy(pool);
  if (errors == 0) {
    printf(" No errors!\n");
  } else {
    printf("\n");
  }
  return errors;
}

Attachment: Makefile
Description: Binary data

Reply via email to