On Mon, Oct 23, 2017 at 4:04 PM, Nicolai Hähnle <nicolai.haeh...@amd.com> wrote: > On 23.10.2017 13:50, Grazvydas Ignotas wrote: >> >> On Sun, Oct 22, 2017 at 9:33 PM, Nicolai Hähnle <nhaeh...@gmail.com> >> wrote: >>> >>> From: Nicolai Hähnle <nicolai.haeh...@amd.com> >>> >>> Fences are now 4 bytes instead of 96 bytes (on my 64-bit system). >>> >>> Signaling a fence is a single atomic operation in the fast case plus a >>> syscall in the slow case. >>> >>> Testing if a fence is signaled is the same as before (a simple >>> comparison), >>> but waiting on a fence is now no more expensive than just testing it in >>> the fast (already signaled) case. >>> --- >>> src/util/futex.h | 4 +++ >>> src/util/u_queue.c | 2 ++ >>> src/util/u_queue.h | 88 >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> 3 files changed, 94 insertions(+) >>> >>> ... >>> >>> diff --git a/src/util/u_queue.h b/src/util/u_queue.h >>> index a3e12260e30..3d9f19f4e6c 100644 >>> --- a/src/util/u_queue.h >>> +++ b/src/util/u_queue.h >>> @@ -28,30 +28,117 @@ >>> * >>> * Jobs can be added from any thread. After that, the wait call can be >>> used >>> * to wait for completion of the job. >>> */ >>> >>> #ifndef U_QUEUE_H >>> #define U_QUEUE_H >>> >>> #include <string.h> >>> >>> +#include "util/futex.h" >>> #include "util/list.h" >>> +#include "util/macros.h" >>> #include "util/u_thread.h" >>> >>> #ifdef __cplusplus >>> extern "C" { >>> #endif >>> >>> #define UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY (1 << 0) >>> #define UTIL_QUEUE_INIT_RESIZE_IF_FULL (1 << 1) >>> >>> +#if defined(__GNUC__) && defined(HAVE_FUTEX) >>> +#define UTIL_QUEUE_FENCE_FUTEX >>> +#else >>> +#define UTIL_QUEUE_FENCE_STANDARD >>> +#endif >>> + >>> +#ifdef UTIL_QUEUE_FENCE_FUTEX >>> +/* Job completion fence. >>> + * Put this into your job structure. >>> + */ >>> +struct util_queue_fence { >>> + /* The fence can be in one of three states: >>> + * 0 - signaled >>> + * 1 - unsignaled >>> + * 2 - unsignaled, may have waiters >>> + */ >>> + uint32_t val; >>> +}; >>> + >>> +static inline void >>> +util_queue_fence_init(struct util_queue_fence *fence) >>> +{ >>> + fence->val = 0; >>> +} >>> + >>> +static inline void >>> +util_queue_fence_destroy(struct util_queue_fence *fence) >>> +{ >>> + assert(fence->val == 0); >>> + /* no-op */ >>> +} >>> + >>> +static inline void >>> +util_queue_fence_wait(struct util_queue_fence *fence) >>> +{ >>> + uint32_t v = fence->val; >>> + >>> + if (likely(v == 0)) >>> + return; >>> + >>> + do { >>> + if (v != 2) >>> + v = __sync_val_compare_and_swap(&fence->val, 1, 2); >>> + >>> + futex_wait(&fence->val, 2); >>> + v = fence->val; >>> + } while(v != 0); >>> +} >>> + >>> +static inline void >>> +util_queue_fence_signal(struct util_queue_fence *fence) >>> +{ >>> + uint32_t val = __sync_lock_test_and_set(&fence->val, 0); >> >> >> As this is _signal(), don't you need a full barrier here? > > > You're right. It's a bit surprising that __sync_lock_test_and_set isn't one. > > >>> + >>> + assert(val != 0); >>> + >>> + if (val == 2) >> >> >> The documentation says some architectures may only support a constant >> of 1 here... > > > Again, surprising, but you're right. > > I believe both of these could be addressed by using > __sync_val_compare_and_swap instead, right?
I believe so too. > Or we could move to new-style gcc atomic built-ins, but I'm not sure > off-hand how widely available those are. There's u_atomic.h that already uses some __atomic_*() built-ins, p_atomic_cmpxchg is just __sync_val_compare_and_swap but could use it for consistency with other mesa code. Gražvydas _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev