Re: TLP tools for stress testing and building test clusters in AWS

Aleksey Yeshchenko Fri, 12 Apr 2019 10:10:41 -0700

Hey Jon,

This sounds exciting and pretty useful, thanks.


Looking forward to using tlp-stress for validating 15066 performance.

We should touch base some time next week to pick a comprehensive set of 
workloads and versions, perhaps?


> On 12 Apr 2019, at 16:34, Jon Haddad <j...@jonhaddad.com> wrote:
> 
> I don't want to derail the discussion about Stabilizing Internode
> Messaging, so I'm starting this as a separate thread.  There was a
> comment that Josh made [1] about doing performance testing with real
> clusters as well as a lot of microbenchmarks, and I'm 100% in support
> of this.  We've been working on some tooling at TLP for the last
> several months to make this a lot easier.  One of the goals has been
> to help improve the 4.0 testing process.
> 
> The first tool we have is tlp-stress [2].  It's designed with a "get
> started in 5 minutes" mindset.  My goal was to ship a stress tool that
> ships with real workloads out of the box that can be easily tweaked,
> similar to how fio allows you to design a disk workload and tweak it
> with paramaters.  Included are stress workloads that stress LWTs (two
> different types), materialized views, counters, time series, and
> key-value workloads.  Each workload can be modified easily to change
> compaction strategies, concurrent operations, number of partitions.
> We can run workloads for a set number of iterations or a custom
> duration.  We've used this *extensively* at TLP to help our customers
> and most of our blog posts that discuss performance use it as well.
> It exports data to both a CSV format and auto sets up prometheus for
> metrics collection / aggregation.  As an example, we were able to
> determine that the compression length set on the paxos tables imposes
> a significant overhead when using the Locking LWT workload, which
> simulates locking and unlocking of rows.  See CASSANDRA-15080 for
> details.
> 
> We have documentation [3] on the TLP website.
> 
> The second tool we've been working on is tlp-cluster [4].  This tool
> is designed to help provision AWS instances for the purposes of
> testing.  To be clear, I don't expect, or want, this tool to be used
> for production environments.  It's designed to assist with the
> Cassandra build process by generating deb packages or re-using the
> ones that have already been uploaded.  Here's a short list of the
> things you'll care about:
> 
> 1. Create instances in AWS for Cassandra using any instance size and
> number of nodes.  Also create tlp-stress instances and a box for
> monitoring
> 2. Use any available build of Cassandra, with a quick option to change
> YAML config.  For example: tlp-stress use 3.11.4 -c
> concurrent_writes:256
> 3. Do custom builds just by pointing to a local Cassandra git repo.
> They can be used the same way as #2.
> 4. tlp-stress is automatically installed on the stress box.
> 5. Everything's installed with pure bash.  I considered something more
> complex, but since this is for development only, it turns out the
> simplest tool possible works well and it means it's easily
> configurable.  Just drop in your own bash script starting with a
> number in a XX_script_name.sh format and it gets run.
> 6. The monitoring box is running Prometheus.  It auto scrapes
> Cassandra using the Instaclustr metrics library.
> 7. Grafana is also installed automatically.  There's a couple sample
> graphs there now.  We plan on having better default graphs soon.
> 
> For the moment it installs java 8 only but that should be easily
> fixable to use java 11 to test ZGC (it's on my radar).
> 
> Documentation for tlp-cluster is here [5].
> 
> There's still some things to work out in the tool, and we've been
> working hard to smooth out the rough edges.  I still haven't announced
> anything WRT tlp-cluster on the TLP blog, because I don't think it's
> quite ready for public consumption, but I think the folks on this list
> are smart enough to see the value in it even if it has a few warts
> still.
> 
> I don't consider myself familiar enough with the networking patch to
> give it a full review, but I am qualified to build tools to help test
> it and go through the testing process myself.  From what I can tell
> the patch is moving the codebase in a positive direction and I'd like
> to help build confidence in it so we can get it merged in.
> 
> We'll continue to build out and improve the tooling with the goal of
> making it easier for people to jump into the QA side of things.
> 
> Jon
> 
> [1] 
> https://lists.apache.org/thread.html/742009c8a77999f4b62062509f087b670275f827d0c1895bf839eece@%3Cdev.cassandra.apache.org%3E
> [2] https://github.com/thelastpickle/tlp-stress
> [3] http://thelastpickle.com/tlp-stress/
> [4] https://github.com/thelastpickle/tlp-cluster
> [5] http://thelastpickle.com/tlp-cluster/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: TLP tools for stress testing and building test clusters in AWS

Reply via email to