Hey Jon, This sounds exciting and pretty useful, thanks.
Looking forward to using tlp-stress for validating 15066 performance. We should touch base some time next week to pick a comprehensive set of workloads and versions, perhaps? > On 12 Apr 2019, at 16:34, Jon Haddad <j...@jonhaddad.com> wrote: > > I don't want to derail the discussion about Stabilizing Internode > Messaging, so I'm starting this as a separate thread. There was a > comment that Josh made [1] about doing performance testing with real > clusters as well as a lot of microbenchmarks, and I'm 100% in support > of this. We've been working on some tooling at TLP for the last > several months to make this a lot easier. One of the goals has been > to help improve the 4.0 testing process. > > The first tool we have is tlp-stress [2]. It's designed with a "get > started in 5 minutes" mindset. My goal was to ship a stress tool that > ships with real workloads out of the box that can be easily tweaked, > similar to how fio allows you to design a disk workload and tweak it > with paramaters. Included are stress workloads that stress LWTs (two > different types), materialized views, counters, time series, and > key-value workloads. Each workload can be modified easily to change > compaction strategies, concurrent operations, number of partitions. > We can run workloads for a set number of iterations or a custom > duration. We've used this *extensively* at TLP to help our customers > and most of our blog posts that discuss performance use it as well. > It exports data to both a CSV format and auto sets up prometheus for > metrics collection / aggregation. As an example, we were able to > determine that the compression length set on the paxos tables imposes > a significant overhead when using the Locking LWT workload, which > simulates locking and unlocking of rows. See CASSANDRA-15080 for > details. > > We have documentation [3] on the TLP website. > > The second tool we've been working on is tlp-cluster [4]. This tool > is designed to help provision AWS instances for the purposes of > testing. To be clear, I don't expect, or want, this tool to be used > for production environments. It's designed to assist with the > Cassandra build process by generating deb packages or re-using the > ones that have already been uploaded. Here's a short list of the > things you'll care about: > > 1. Create instances in AWS for Cassandra using any instance size and > number of nodes. Also create tlp-stress instances and a box for > monitoring > 2. Use any available build of Cassandra, with a quick option to change > YAML config. For example: tlp-stress use 3.11.4 -c > concurrent_writes:256 > 3. Do custom builds just by pointing to a local Cassandra git repo. > They can be used the same way as #2. > 4. tlp-stress is automatically installed on the stress box. > 5. Everything's installed with pure bash. I considered something more > complex, but since this is for development only, it turns out the > simplest tool possible works well and it means it's easily > configurable. Just drop in your own bash script starting with a > number in a XX_script_name.sh format and it gets run. > 6. The monitoring box is running Prometheus. It auto scrapes > Cassandra using the Instaclustr metrics library. > 7. Grafana is also installed automatically. There's a couple sample > graphs there now. We plan on having better default graphs soon. > > For the moment it installs java 8 only but that should be easily > fixable to use java 11 to test ZGC (it's on my radar). > > Documentation for tlp-cluster is here [5]. > > There's still some things to work out in the tool, and we've been > working hard to smooth out the rough edges. I still haven't announced > anything WRT tlp-cluster on the TLP blog, because I don't think it's > quite ready for public consumption, but I think the folks on this list > are smart enough to see the value in it even if it has a few warts > still. > > I don't consider myself familiar enough with the networking patch to > give it a full review, but I am qualified to build tools to help test > it and go through the testing process myself. From what I can tell > the patch is moving the codebase in a positive direction and I'd like > to help build confidence in it so we can get it merged in. > > We'll continue to build out and improve the tooling with the goal of > making it easier for people to jump into the QA side of things. > > Jon > > [1] > https://lists.apache.org/thread.html/742009c8a77999f4b62062509f087b670275f827d0c1895bf839eece@%3Cdev.cassandra.apache.org%3E > [2] https://github.com/thelastpickle/tlp-stress > [3] http://thelastpickle.com/tlp-stress/ > [4] https://github.com/thelastpickle/tlp-cluster > [5] http://thelastpickle.com/tlp-cluster/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org