Hi,

Sorry for my late reply due to the vacation.

> What is the performance impact of this for currently working applications that
> use a single thread to program flow rules.  You are adding a couple of system
> calls to what was formerly a totally usermode operation.

If I understand correctly, in the non-contended single thread case, pthread 
mutex lock should not go to the kernel space.
I also wrote a small application with pthread mutex, and strace shows no system 
call was introduced.

Another simple testing code below is to check the cycles cost difference in 
every round between pthread mutex and spin_lock.

//#define FT_USE_SPIN 1
#define NLOOP 1000000

#ifdef FT_USE_SPIN
static rte_spinlock_t sp_lock;
#else
static pthread_mutex_t ml_lock;
#endif

static inline void ft_init_lock(void)
{
#ifdef FT_USE_SPIN
        rte_spinlock_init(&sp_lock);
#else
        pthread_mutex_init(&ml_lock, NULL);
#endif
}

static inline void ft_unlock(void)
{
#ifdef FT_USE_SPIN
        rte_spinlock_unlock(&sp_lock);
#else
        pthread_mutex_unlock(&ml_lock);
#endif
}

static inline void ft_lock(void)
{
#ifdef FT_USE_SPIN
        rte_spinlock_lock(&sp_lock);
#else
        pthread_mutex_lock(&ml_lock);
#endif
}

static void ft_check_cycles(void)
{
        static int init = 0;
        uint64_t start, end;
        int i, n;

        if (!init) {
                init = 1;
                ft_init_lock();
        }

        /* Make code hot. */
        ft_lock();
        n = 0;
        ft_unlock();

        start = rte_rdtsc();
        for (i = 0; i < NLOOP; i++) {
                ft_lock();
                n++;
                ft_unlock();
        }
        end = rte_rdtsc();
        printf("loop:%d, cycles per loop:%f\n", n, (end - start) / (float)n);
}

They  both showed around 50 cycles similar costing per loop.

The reason pthread mutex lock chosen here is that most DPDK applications like 
OVS-DPDK is using that.

Thanks,
SuanmingMou

Reply via email to