I don't want to derail the discussion about Stabilizing Internode Messaging, so I'm starting this as a separate thread. There was a comment that Josh made [1] about doing performance testing with real clusters as well as a lot of microbenchmarks, and I'm 100% in support of this. We've been working on some tooling at TLP for the last several months to make this a lot easier. One of the goals has been to help improve the 4.0 testing process.
The first tool we have is tlp-stress [2]. It's designed with a "get started in 5 minutes" mindset. My goal was to ship a stress tool that ships with real workloads out of the box that can be easily tweaked, similar to how fio allows you to design a disk workload and tweak it with paramaters. Included are stress workloads that stress LWTs (two different types), materialized views, counters, time series, and key-value workloads. Each workload can be modified easily to change compaction strategies, concurrent operations, number of partitions. We can run workloads for a set number of iterations or a custom duration. We've used this *extensively* at TLP to help our customers and most of our blog posts that discuss performance use it as well. It exports data to both a CSV format and auto sets up prometheus for metrics collection / aggregation. As an example, we were able to determine that the compression length set on the paxos tables imposes a significant overhead when using the Locking LWT workload, which simulates locking and unlocking of rows. See CASSANDRA-15080 for details. We have documentation [3] on the TLP website. The second tool we've been working on is tlp-cluster [4]. This tool is designed to help provision AWS instances for the purposes of testing. To be clear, I don't expect, or want, this tool to be used for production environments. It's designed to assist with the Cassandra build process by generating deb packages or re-using the ones that have already been uploaded. Here's a short list of the things you'll care about: 1. Create instances in AWS for Cassandra using any instance size and number of nodes. Also create tlp-stress instances and a box for monitoring 2. Use any available build of Cassandra, with a quick option to change YAML config. For example: tlp-stress use 3.11.4 -c concurrent_writes:256 3. Do custom builds just by pointing to a local Cassandra git repo. They can be used the same way as #2. 4. tlp-stress is automatically installed on the stress box. 5. Everything's installed with pure bash. I considered something more complex, but since this is for development only, it turns out the simplest tool possible works well and it means it's easily configurable. Just drop in your own bash script starting with a number in a XX_script_name.sh format and it gets run. 6. The monitoring box is running Prometheus. It auto scrapes Cassandra using the Instaclustr metrics library. 7. Grafana is also installed automatically. There's a couple sample graphs there now. We plan on having better default graphs soon. For the moment it installs java 8 only but that should be easily fixable to use java 11 to test ZGC (it's on my radar). Documentation for tlp-cluster is here [5]. There's still some things to work out in the tool, and we've been working hard to smooth out the rough edges. I still haven't announced anything WRT tlp-cluster on the TLP blog, because I don't think it's quite ready for public consumption, but I think the folks on this list are smart enough to see the value in it even if it has a few warts still. I don't consider myself familiar enough with the networking patch to give it a full review, but I am qualified to build tools to help test it and go through the testing process myself. From what I can tell the patch is moving the codebase in a positive direction and I'd like to help build confidence in it so we can get it merged in. We'll continue to build out and improve the tooling with the goal of making it easier for people to jump into the QA side of things. Jon [1] https://lists.apache.org/thread.html/742009c8a77999f4b62062509f087b670275f827d0c1895bf839eece@%3Cdev.cassandra.apache.org%3E [2] https://github.com/thelastpickle/tlp-stress [3] http://thelastpickle.com/tlp-stress/ [4] https://github.com/thelastpickle/tlp-cluster [5] http://thelastpickle.com/tlp-cluster/ --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org