On Thu, Aug 28, 2014 at 09:57:42AM -0700, Gurucharan Shetty wrote: > Before this change (i.e., with pthread locks for atomics on Windows), > the benchmark for cmap and hmap was as follows: > > $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1 > Benchmarking with n=10000000, 3 threads, 1.00% mutations: > cmap insert: 61070 ms > cmap iterate: 2750 ms > cmap search: 14238 ms > cmap destroy: 8354 ms > > hmap insert: 1701 ms > hmap iterate: 985 ms > hmap search: 3755 ms > hmap destroy: 1052 ms > > After this change, the benchmark is as follows: > $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1 > Benchmarking with n=10000000, 3 threads, 1.00% mutations: > cmap insert: 3666 ms > cmap iterate: 365 ms > cmap search: 2016 ms > cmap destroy: 1331 ms > > hmap insert: 1495 ms > hmap iterate: 1026 ms > hmap search: 4167 ms > hmap destroy: 1046 ms > > So there is clearly a big improvement for cmap. > > But the correspondig test on Linux (with gcc 4.6) yeilds the following: > > ./tests/ovstest test-cmap benchmark 10000000 3 1 > Benchmarking with n=10000000, 3 threads, 1.00% mutations: > cmap insert: 3917 ms > cmap iterate: 355 ms > cmap search: 871 ms > cmap destroy: 1158 ms > > hmap insert: 1988 ms > hmap iterate: 1005 ms > hmap search: 5428 ms > hmap destroy: 980 ms > > So for this particular test, except for "cmap search", Windows and > Linux have similar performance. Windows is around 2.5x slower in "cmap search" > compared to Linux. This has to be investigated.
Either way it's a great improvement. Thank you! This comment needs an update: > +/* This header implements atomic operation primitives using pthreads. */ We usually format long comments with a line of * down the left side: > +/* From msdn documentation: With Visual Studio 2003, volatile to volatile > +references are ordered; the compiler will not re-order volatile variable > +access. With Visual Studio 2005, the compiler also uses acquire semantics > +for read operations on volatile variables and release semantics for write > +operations on volatile variables (when supported by the CPU). */ > +#define ATOMIC(TYPE) TYPE volatile I see several funny casts with an extra space in them, e.g. > + (int32_t ) SRC); \ would normally be > + (int32_t) SRC); \ I'd normally expect 64-bit reads and write to be atomic when we're building for x86-64: > +/* 64 bit reads and write are not atomic on x86. I think that the *s here should be outside the parentheses, so that DST is fully parenthesized, e.g. *(DST) instead of (*DST). Similarly for atomic_read_explicit(): > +#define atomic_store_explicit(DST, SRC, ORDER) \ > + if (sizeof (*DST) == 1) { \ > + atomic_storeX(8, DST, SRC, ORDER) \ > + } else if (sizeof (*DST) == 2) { \ > + atomic_storeX(16, DST, SRC, ORDER) \ > + } else if (sizeof (*DST) == 4) { \ > + atomic_store32(DST, SRC, ORDER) \ > + } else if (sizeof (*DST) == 8) { \ > + atomic_store64(DST, SRC, ORDER) \ > + } Also I wonder whether we should add a final "else { abort(); }" just in case? Or "BUILD_ASSERT(sizeof *(DST) == 1 || sizeof *(DST) == 2 || ...);"? I think that many of the macros should more carefully parenthesize their argument expansions. Here are three examples but I see others: > +#define atomic_read32(SRC, DST, ORDER) \ > + if (ORDER == memory_order_seq_cst) { \ > + *DST = InterlockedOr((int32_t volatile *) SRC, 0); \ > + } else { \ > + *DST = *SRC; \ > + } > + > +/* 64 bit reads and writes are not atomic on x86. */ > +#define atomic_read64(SRC, DST, ORDER) \ > + *DST = InterlockedOr64((int64_t volatile *) SRC, 0); > + > +/* For 8 and 16 bit variations. */ > +#define atomic_readX(X, SRC, DST, ORDER) \ > + if (ORDER == memory_order_seq_cst) { \ > + *DST = InterlockedOr##X((int##X##_t volatile *) SRC, 0); \ > + } else { \ > + *DST = *SRC; \ > + } > + > +#define atomic_read(SRC, DST) \ > + atomic_read_explicit(SRC, DST, memory_order_seq_cst) I am not sure that the stairstepping below is necessary: > +#define atomic_compare_exchange_strong_explicit(DST, EXP, SRC, ORD1, ORD2) > \ > + (sizeof (*DST) == 1 > \ > + ? atomic_compare_exchange8((int8_t volatile *) DST, (int8_t *) EXP, > \ > + (int8_t ) SRC) > \ > + : (sizeof (*DST) == 2 > \ > + ? atomic_compare_exchange16((int16_t volatile *) DST, (int16_t *) > EXP,\ > + (int16_t ) SRC) > \ > + : (sizeof (*DST) == 4 > \ > + ? atomic_compare_exchange32((int32_t volatile *) DST, > \ > + (int32_t *) EXP, (int32_t ) SRC) > \ > + : (sizeof (*DST) == 8 > \ > + ? atomic_compare_exchange64((int64_t volatile *) DST, > \ > + (int64_t *) EXP, (int64_t ) SRC) > \ > + : ovs_fatal(0, > \ > + "atomic operation with size greater than 8 > bytes"), \ > + atomic_compare_unreachable())))) I think that it can be written as: #define atomic_compare_exchange_strong_explicit(DST, EXP, SRC, ORD1, ORD2) \ (sizeof (*DST) == 1 \ ? atomic_compare_exchange8((int8_t volatile *) DST, (int8_t *) EXP, \ (int8_t ) SRC) \ : sizeof (*DST) == 2 \ ? atomic_compare_exchange16((int16_t volatile *) DST, (int16_t *) EXP, \ (int16_t ) SRC) \ : sizeof (*DST) == 4 \ ? atomic_compare_exchange32((int32_t volatile *) DST, \ (int32_t *) EXP, (int32_t ) SRC) \ : sizeof (*DST) == 8 \ ? atomic_compare_exchange64((int64_t volatile *) DST, \ (int64_t *) EXP, (int64_t ) SRC) \ : (ovs_fatal(0, "atomic operation with size greater than 8 bytes"), \ atomic_compare_unreachable())) _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev