Re: TLP tools for stress testing and building test clusters in AWS

Jon Haddad Mon, 15 Apr 2019 14:39:36 -0700

Hey all,

I've set up a Zoom call for 9AM Pacific time.  Everyone's welcome to join.


https://zoom.us/j/189920888

Looking forward to a good discussion on how we can all pitch in on
getting 4.0 out the door.

Jon

On Sat, Apr 13, 2019 at 9:14 AM Jonathan Koppenhofer
<j...@koppedomain.com> wrote:
>
> Wednesday would work for me.
>
> We use and (slightly) contribute to tlp tools. We are platform testing and
> beginning 4.0 testing ourselves, so an in person overview would be great!
>
> On Sat, Apr 13, 2019, 8:48 AM Aleksey Yeshchenko <alek...@apple.com.invalid>
> wrote:
>
> > Wednesday and Thursday, either, at 9 AM pacific WFM.
> >
> > > On 13 Apr 2019, at 13:31, Stefan Miklosovic <
> > stefan.mikloso...@instaclustr.com> wrote:
> > >
> > > Hi Jon,
> > >
> > > I would like be on that call too but I am off on Thursday.
> > >
> > > I am from Australia so 5pm London time is ours 2am next day so your
> > > Wednesday morning is my Thursday night. Wednesday early morning so
> > > your Tuesday morning and London's afternoon would be the best.
> > >
> > > Recording the thing would be definitely helpful too.
> > >
> > > On Sat, 13 Apr 2019 at 07:45, Jon Haddad <j...@jonhaddad.com> wrote:
> > >>
> > >> I'd be more than happy to hop on a call next week to give you both
> > >> (and anyone else interested) a tour of our dev tools.  Maybe something
> > >> early morning on my end, which should be your evening, could work?
> > >>
> > >> I can set up a Zoom conference to get everyone acquainted.  We can
> > >> record and post it for any who can't make it.
> > >>
> > >> I'm thinking Tuesday, Wednesday, or Thursday morning, 9AM Pacific (5pm
> > >> London)?  If anyone's interested please reply with what dates work.
> > >> I'll be sure to post the details back here with the zoom link in case
> > >> anyone wants to join that didn't get a chance to reply, as well as a
> > >> link to the recorded call.
> > >>
> > >> Jon
> > >>
> > >> On Fri, Apr 12, 2019 at 10:41 AM Benedict Elliott Smith
> > >> <bened...@apache.org> wrote:
> > >>>
> > >>> +1
> > >>>
> > >>> I’m also just as excited to see some standardised workloads and test
> > bed.  At the moment we’re benefiting from some large contributors doing
> > their own proprietary performance testing, which is super valuable and
> > something we’ve lacked before.  But I’m also keen to see some more
> > representative workloads that are reproducible by anybody in the community
> > take shape.
> > >>>
> > >>>
> > >>>> On 12 Apr 2019, at 18:09, Aleksey Yeshchenko
> > <alek...@apple.com.INVALID> wrote:
> > >>>>
> > >>>> Hey Jon,
> > >>>>
> > >>>> This sounds exciting and pretty useful, thanks.
> > >>>>
> > >>>> Looking forward to using tlp-stress for validating 15066 performance.
> > >>>>
> > >>>> We should touch base some time next week to pick a comprehensive set
> > of workloads and versions, perhaps?
> > >>>>
> > >>>>
> > >>>>> On 12 Apr 2019, at 16:34, Jon Haddad <j...@jonhaddad.com> wrote:
> > >>>>>
> > >>>>> I don't want to derail the discussion about Stabilizing Internode
> > >>>>> Messaging, so I'm starting this as a separate thread.  There was a
> > >>>>> comment that Josh made [1] about doing performance testing with real
> > >>>>> clusters as well as a lot of microbenchmarks, and I'm 100% in support
> > >>>>> of this.  We've been working on some tooling at TLP for the last
> > >>>>> several months to make this a lot easier.  One of the goals has been
> > >>>>> to help improve the 4.0 testing process.
> > >>>>>
> > >>>>> The first tool we have is tlp-stress [2].  It's designed with a "get
> > >>>>> started in 5 minutes" mindset.  My goal was to ship a stress tool
> > that
> > >>>>> ships with real workloads out of the box that can be easily tweaked,
> > >>>>> similar to how fio allows you to design a disk workload and tweak it
> > >>>>> with paramaters.  Included are stress workloads that stress LWTs (two
> > >>>>> different types), materialized views, counters, time series, and
> > >>>>> key-value workloads.  Each workload can be modified easily to change
> > >>>>> compaction strategies, concurrent operations, number of partitions.
> > >>>>> We can run workloads for a set number of iterations or a custom
> > >>>>> duration.  We've used this *extensively* at TLP to help our customers
> > >>>>> and most of our blog posts that discuss performance use it as well.
> > >>>>> It exports data to both a CSV format and auto sets up prometheus for
> > >>>>> metrics collection / aggregation.  As an example, we were able to
> > >>>>> determine that the compression length set on the paxos tables imposes
> > >>>>> a significant overhead when using the Locking LWT workload, which
> > >>>>> simulates locking and unlocking of rows.  See CASSANDRA-15080 for
> > >>>>> details.
> > >>>>>
> > >>>>> We have documentation [3] on the TLP website.
> > >>>>>
> > >>>>> The second tool we've been working on is tlp-cluster [4].  This tool
> > >>>>> is designed to help provision AWS instances for the purposes of
> > >>>>> testing.  To be clear, I don't expect, or want, this tool to be used
> > >>>>> for production environments.  It's designed to assist with the
> > >>>>> Cassandra build process by generating deb packages or re-using the
> > >>>>> ones that have already been uploaded.  Here's a short list of the
> > >>>>> things you'll care about:
> > >>>>>
> > >>>>> 1. Create instances in AWS for Cassandra using any instance size and
> > >>>>> number of nodes.  Also create tlp-stress instances and a box for
> > >>>>> monitoring
> > >>>>> 2. Use any available build of Cassandra, with a quick option to
> > change
> > >>>>> YAML config.  For example: tlp-stress use 3.11.4 -c
> > >>>>> concurrent_writes:256
> > >>>>> 3. Do custom builds just by pointing to a local Cassandra git repo.
> > >>>>> They can be used the same way as #2.
> > >>>>> 4. tlp-stress is automatically installed on the stress box.
> > >>>>> 5. Everything's installed with pure bash.  I considered something
> > more
> > >>>>> complex, but since this is for development only, it turns out the
> > >>>>> simplest tool possible works well and it means it's easily
> > >>>>> configurable.  Just drop in your own bash script starting with a
> > >>>>> number in a XX_script_name.sh format and it gets run.
> > >>>>> 6. The monitoring box is running Prometheus.  It auto scrapes
> > >>>>> Cassandra using the Instaclustr metrics library.
> > >>>>> 7. Grafana is also installed automatically.  There's a couple sample
> > >>>>> graphs there now.  We plan on having better default graphs soon.
> > >>>>>
> > >>>>> For the moment it installs java 8 only but that should be easily
> > >>>>> fixable to use java 11 to test ZGC (it's on my radar).
> > >>>>>
> > >>>>> Documentation for tlp-cluster is here [5].
> > >>>>>
> > >>>>> There's still some things to work out in the tool, and we've been
> > >>>>> working hard to smooth out the rough edges.  I still haven't
> > announced
> > >>>>> anything WRT tlp-cluster on the TLP blog, because I don't think it's
> > >>>>> quite ready for public consumption, but I think the folks on this
> > list
> > >>>>> are smart enough to see the value in it even if it has a few warts
> > >>>>> still.
> > >>>>>
> > >>>>> I don't consider myself familiar enough with the networking patch to
> > >>>>> give it a full review, but I am qualified to build tools to help test
> > >>>>> it and go through the testing process myself.  From what I can tell
> > >>>>> the patch is moving the codebase in a positive direction and I'd like
> > >>>>> to help build confidence in it so we can get it merged in.
> > >>>>>
> > >>>>> We'll continue to build out and improve the tooling with the goal of
> > >>>>> making it easier for people to jump into the QA side of things.
> > >>>>>
> > >>>>> Jon
> > >>>>>
> > >>>>> [1]
> > https://lists.apache.org/thread.html/742009c8a77999f4b62062509f087b670275f827d0c1895bf839eece@%3Cdev.cassandra.apache.org%3E
> > >>>>> [2] https://github.com/thelastpickle/tlp-stress
> > >>>>> [3] http://thelastpickle.com/tlp-stress/
> > >>>>> [4] https://github.com/thelastpickle/tlp-cluster
> > >>>>> [5] http://thelastpickle.com/tlp-cluster/
> > >>>>>
> > >>>>> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>>>>
> > >>>>
> > >>>>
> > >>>> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>>>
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: TLP tools for stress testing and building test clusters in AWS

Reply via email to