Hi, Sorry for my late reply due to the vacation.
> What is the performance impact of this for currently working applications that > use a single thread to program flow rules. You are adding a couple of system > calls to what was formerly a totally usermode operation. If I understand correctly, in the non-contended single thread case, pthread mutex lock should not go to the kernel space. I also wrote a small application with pthread mutex, and strace shows no system call was introduced. Another simple testing code below is to check the cycles cost difference in every round between pthread mutex and spin_lock. //#define FT_USE_SPIN 1 #define NLOOP 1000000 #ifdef FT_USE_SPIN static rte_spinlock_t sp_lock; #else static pthread_mutex_t ml_lock; #endif static inline void ft_init_lock(void) { #ifdef FT_USE_SPIN rte_spinlock_init(&sp_lock); #else pthread_mutex_init(&ml_lock, NULL); #endif } static inline void ft_unlock(void) { #ifdef FT_USE_SPIN rte_spinlock_unlock(&sp_lock); #else pthread_mutex_unlock(&ml_lock); #endif } static inline void ft_lock(void) { #ifdef FT_USE_SPIN rte_spinlock_lock(&sp_lock); #else pthread_mutex_lock(&ml_lock); #endif } static void ft_check_cycles(void) { static int init = 0; uint64_t start, end; int i, n; if (!init) { init = 1; ft_init_lock(); } /* Make code hot. */ ft_lock(); n = 0; ft_unlock(); start = rte_rdtsc(); for (i = 0; i < NLOOP; i++) { ft_lock(); n++; ft_unlock(); } end = rte_rdtsc(); printf("loop:%d, cycles per loop:%f\n", n, (end - start) / (float)n); } They both showed around 50 cycles similar costing per loop. The reason pthread mutex lock chosen here is that most DPDK applications like OVS-DPDK is using that. Thanks, SuanmingMou