On Wed, Jan 11, 2023 at 12:08:26 +0000, Daniel P. Berrangé wrote:
> First, I find the test to be a little unreliable the first few
> times it is ran. I ran it in a loop 20 times and it got more
> stable results. Looking at just the QTree lines I get something
> typically like:

Agreed, this is a problem in the benchmarks as written. I've
changed them now to run for at least 200ms, which seems to
stabilise results on my machine. (See the appended patch.)

> One thing to bear in mind is that if setting G_SLICE=always-malloc, we
> should in theory see the exact same results for GTree and QTree.

Not quite exact same results.
There are a couple of differences that matter in this context:
- We're comparing a shared library, compiled most likely with -fPIC,
  against a static library without fPIC. I've done some tests and
  indeed using fPIC on qtree makes a difference on the generated code
  and in the resulting benchmark performance.
- With G_SLICE=always-malloc we are still going through g_slice
  to then defer to the system's malloc. Should not matter much
  except in microbenchmarks like these.

That said, I have not been able to get a 1:1 match of perf results
between qtree and gtree, even after compiling fPIC on qtree, compiling
Glib myself, and modifying qtree to use gslice.
I've also tried tcmalloc and glibc.

One thing I didn't try due to lack of time is to make qtree into
a shared library and benchmark that -- I think that would finally
give us identical results.

> So overall if I ignore the unreliable results, my take away is
> that malloc is pretty much always a win over gslice, sometimes
> massively so, but at least shouldn't be worse.
> 
> NB, I'm using Fedora 37 with glibc.  Mileage may vary with different
> libc impls.

I have to agree, I just wanted to be honest by sharing the numbers
I had, but in fairness I didn't put enough time in getting
those numbers for them to be useful, which is tricky when dealing
with microbenchmarks.

Thanks,
                E.

---
diff --git a/tests/bench/qtree-bench.c b/tests/bench/qtree-bench.c
index 9cfaf8820e..ed42e73293 100644
--- a/tests/bench/qtree-bench.c
+++ b/tests/bench/qtree-bench.c
@@ -118,9 +118,9 @@ static inline void remove_all(void *tree, enum impl_type 
impl)
     }
 }
 
-static double run_benchmark(const struct benchmark *bench,
-                            enum impl_type impl,
-                            size_t n_elems)
+static int64_t run_benchmark(const struct benchmark *bench,
+                             enum impl_type impl,
+                             size_t n_elems)
 {
     void *tree;
     size_t *keys;
@@ -212,7 +212,7 @@ static double run_benchmark(const struct benchmark *bench,
     }
     g_free(keys);
 
-    return (double)n_elems / ns * 1e3;
+    return ns;
 }
 
 int main(int argc, char *argv[])
@@ -232,7 +232,20 @@ int main(int argc, char *argv[])
             const struct tree_implementation *impl = &impls[j];
             for (int k = 0; k < ARRAY_SIZE(benchmarks); k++) {
                 const struct benchmark *bench = &benchmarks[k];
-                res[k][j][i] = run_benchmark(bench, impl->type, size);
+
+                /* warm-up run */
+                run_benchmark(bench, impl->type, size);
+
+                int64_t total_ns = 0;
+                int64_t n_runs = 0;
+                while (total_ns < 2e8 || n_runs < 3) {
+                    total_ns += run_benchmark(bench, impl->type, size);
+                    n_runs++;
+                }
+                double ns_per_run = (double)total_ns / n_runs;
+
+                // Throughput, in Mops/s.
+                res[k][j][i] = size / ns_per_run * 1e3;
             }
         }
     }


Reply via email to