Re: [DISCUSS] Taking another(other(other)) stab at performance testing

Henrik Ingo Sat, 07 Jan 2023 19:12:47 -0800

Hi Josh, all

I'm sitting at an airport, so rather than participating in the comment
threads in the doc, I will just post some high level principles I've
derived during my own long career in performance testing.


Infra:
 - It's a common myth that you need to use on premise HW because cloud HW
is noisy.
 - Most likely the opposite is true: A small cluster of lab hardware runs
the risk of some sysadmin with root access manually modifying the servers
and leave them in an inconsistent configuration. Otoh a public cloud is
configured with infrastructure as code, so every change is tracked in
version control.
 - Four part article on how we tuned EC2 at my previous employer: 1
<https://www.mongodb.com/blog/post/reducing-variability-performance-tests-ec2-setup-key-results>,
2
<https://www.mongodb.com/blog/post/repeatable-performance-tests-ec2-instances-neither-good-nor-bad>,
3
<https://www.mongodb.com/blog/post/repeadtable-performance-tests-ebs-instances-stable-option>
, 4
<https://www.mongodb.com/blog/post/repeatable-performance-tests-cpu-options-best-disabled>
.
 - Trust no one, measure everything. For example, don't  trust that what
I'm writing here is true. Run sysbench against your HW, then you have first
hand observations.
 - Specifically using EC2 has an additional benefit that the instance types
can be considered well known and standard HW configurations more than any
on premise system.

Performance testing is regression testing
 - Important: Run perf tests with the nightly build. Make sure your HW
configuration is repeatable and low variability from day to day.
 - Less important / later:
     - Using complciated benchmarks (tpcc...) that try to model a real
world app. These can take weeks to develop, each.
     - Having lots of different benchmarks for "coverage".
 - Adding the above two together: Running a simple key-value test (e.g.
YCSB) every night in an automated and repeatable way, and storing the
result - whatever is considered relevant - so that you end up with a
timeseries is a great start and I'd take this over that complicated
"representative" benchmark any day.
 - Use change detection to automatically and deterministically flag
statistically significant change points (regressions).
 - Literature: detecting-performance-regressions-with-datastax-hunter
<https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4>
,
 - Literature: Fallout: Distributed Systems Testing as a Service
<https://www.semanticscholar.org/paper/0cebbfebeab6513e98ad1646cc795cabd5ddad8a>
 Automated system performance testing at MongoDB
<https://www.connectedpapers.com/main/0cebbfebeab6513e98ad1646cc795cabd5ddad8a/graph>


Common gotchas:
 - Testing with a small data set that fits entirely in RAM. A good dataset
is 5x the RAM available to the DB process. Or you just test with the size a
real production server would be running - at Datastax we have tests that
use a 1TB and 1.5TB data set, because those tend to be standard maximum
sizes (per node) at customers.
 - The test runtime is too short. IT depends on the database what is a good
test duration. The goal is to reach stable state. But for an LSM database
like Cassandra this can be hard. For other databases I worked with, the
default is typically to flush every 15 to 60 seconds, and the test duration
should be a multiple of those (3 to 5 min).
 - Naive comparisons to determine whether a test result is a regression or
not. For example benchmarking the new release against the stable version,
one run each, and reporting the result as "fact". Or comparing today's
result with yesterday's.
'

Building perf testing systems following the above principles have had a lot
of positive impact in my projects. For example, at my previous employer we
caught 17 significant regressions during the 1 year long development cycle
of the next major version. (see my paper above)  Otoh after the GA release,
during the next year users only reported 1 significant performance
regression. That is to say, the perf testing of nightly builds caught all
but one regressions in the new major version.

henrik




On Fri, Dec 30, 2022 at 7:41 AM Josh McKenzie <jmcken...@apache.org> wrote:

> There was a really interesting presentation from the Lucene folks at
> ApacheCon about how they're doing perf regression testing. That combined
> with some recent contributors wanting to get involved on some performance
> work and not having much direction or clarity on how to get involved led
> some of us to come together and riff on what we might be able to take away
> from that presentation and context.
>
> Lucene presentation: "Learning from 11+ years of Apache Lucene
> benchmarks":
> https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p
>
> Their nightly indexing benchmark site:
> https://home.apache.org/~mikemccand/lucenebench/indexing.html
>
> I've checked in with a handful of performance minded contributors in early
> December and we came up with a first draft, then some others of us met on
> an adhoc call on the 12/9 (which was recorded; ping on this thread if you'd
> like that linked - I believe Joey Lynch has that).
>
> Here's where we landed after the discussions earlier this month (1st page,
> estimated reading time 5 minutes):
> https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit#
>
> Curious to hear what other perspectives there are out there on the topic.
>
> Early Happy New Years everyone!
>
> ~Josh
>
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Taking another(other(other)) stab at performance testing

Reply via email to