Re: [DISCUSSION] Ignite integration testing framework.

Anton Vinogradov Tue, 30 Jun 2020 04:59:06 -0700

Folks,
First, I've created PR [1] with ducktests improvements

PR contains the following changes
- Pme-free switch proof-benchmark (2.7.6 vs master)
- Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
- Global refactoring
-- benchmarks javacode simplification
-- services python and java classes code deduplication
-- fail-fast checks for java and python (eg. application should explicitly
write it finished with success)
-- simple results extraction from tests and benchmarks
-- javacode now configurable from tests/benchmarks
-- proper SIGTERM handling at javacode (eg. it may finish last operation
and log results)
-- docker volume now marked as delegated to increase execution speed for
mac & win users
-- Ignite cluster now start in parallel (start speed-up)
-- Ignite can be configured at test/benchmark
- full and module assembly scripts added


Second, I'd like to propose to accept ducktests [2] (ducktape integration)
as a target "PoC check & real topology benchmarking tool".

Ducktape pros
- Developed for distributed system by distributed system developers.
- Developed since 2014, stable.
- Proven usability by usage at Kafka.
- Dozens of dozens tests and benchmarks at Kafka as a great example pack.
- Built-in Docker support for rapid development and checks.
- Great for CI automation.

As an additional motivation, at least 3 teams
- IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix
speed-up))
- Ignite SE Plugins team (to check plugin's features does not slow-down or
broke AI features)
- Ignite SE QA team (to append already developed smoke/load/failover tests
to AI codebase)
now, wait for ducktest merge to start checking cases they working on in AI
way.

Thoughts?

[1] https://github.com/apache/ignite/pull/7967
[2] https://github.com/apache/ignite/tree/ignite-ducktape

On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhi...@apache.org>
wrote:

> Hello, Maxim.
>
> Thank you for so detailed explanation.
>
> Can we put the content of this discussion somewhere on the wiki?
> So It doesn’t get lost.
>
> I divide the answer in several parts. From the requirements to the
> implementation.
> So, if we agreed on the requirements we can proceed with the discussion of
> the implementation.
>
> 1. Requirements:
>
> The main goal I want to achieve is *reproducibility* of the tests.
> I’m sick and tired with the zillions of flaky, rarely failed, and almost
> never failed tests in Ignite codebase.
> We should start with the simplest scenarios that will be as reliable as
> steel :)
>
> I want to know for sure:
>   - Is this PR makes rebalance quicker or not?
>   - Is this PR makes PME quicker or not?
>
> So, your description of the complex test scenario looks as a next step to
> me.
>
> Anyway, It’s cool we already have one.
>
> The second goal is to have a strict test lifecycle as we have in JUnit and
> similar frameworks.
>
> > It covers production-like deployment and running a scenarios over a
> single database instance.
>
> Do you mean «single cluster» or «single host»?
>
> 2. Existing tests:
>
> > A Combinator suite allows to run set of operations concurrently over
> given database instance.
> > A Consumption suite allows to run a set production-like actions over
> given set of Ignite/GridGain versions and compare test metrics across
> versions
> > A Yardstick suite
> > A Stress suite that simulates hardware environment degradation
> > An Ultimate, DR and Compatibility suites that performs functional
> regression testing
> > Regression
>
> Great news that we already have so many choices for testing!
> Mature test base is a big +1 for Tiden.
>
> 3. Comparison:
>
> > Criteria: Test configuration
> > Ducktape: single JSON string for all tests
> > Tiden: any number of YaML config files, command line option for
> fine-grained test configuration, ability to select/modify tests behavior
> based on Ignite version.
>
> 1. Many YAML files can be hard to maintain.
> 2. In ducktape, you can set parameters via «—parameters» option. Please,
> take a look at the doc [1]
>
> > Criteria: Cluster control
> > Tiden: additionally can address cluster as a whole and execute remote
> commands in parallel.
>
> It seems we implement this ability in the PoC, already.
>
> > Criteria: Test assertions
> > Tiden: simple asserts, also few customized assertion helpers.
> > Ducktape: simple asserts.
>
> Can you, please, be more specific.
> What helpers do you have in mind?
> Ducktape has an asserts that waits for logfile messages or some process
> finish.
>
> > Criteria: Test reporting
> > Ducktape: limited to its own text/HTML format
>
> Ducktape have
> 1. Text reporter
> 2. Customizable HTML reporter
> 3. JSON reporter.
>
> We can show JSON with the any template or tool.
>
> > Criteria: Provisioning and deployment
> > Ducktape: can provision subset of hosts from cluster for test needs.
> However, that means, that test can’t be scaled without test code changes.
> Does not do any deploy, relies on external means, e.g. pre-packaged in
> docker image, as in PoC.
>
> This is not true.
>
> 1. We can set explicit test parameters(node number) via parameters.
> We can increase client count of cluster size without test code changes.
>
> 2. We have many choices for the test environment. These choices are tested
> and used in other projects:
>         * docker
>         * vagrant
>         * private cloud(ssh access)
>         * ec2
> Please, take a look at Kafka documentation [2]
>
> > I can continue more on this, but it should be enough for now:
>
> We need to go deeper! :)
>
> [1]  https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> [2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>
> > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshon...@yandex.ru>
> написал(а):
> >
> > Greetings, Nikolay,
> >
> > First of all, thank you for you great effort preparing PoC of
> integration testing to Ignite community.
> >
> > It’s a shame Ignite did not have at least some such tests yet, however,
> GridGain, as a major contributor to Apache Ignite had a profound collection
> of in-house tools to perform integration and performance testing for years
> already and while we slowly consider sharing our expertise with the
> community, your initiative makes us drive that process a bit faster, thanks
> a lot!
> >
> > I reviewed your PoC and want to share a little about what we do on our
> part, why and how, hope it would help community take proper course.
> >
> > First I’ll do a brief overview of what decisions we made and what we do
> have in our private code base, next I’ll describe what we have already
> donated to the public and what we plan public next, then I’ll compare both
> approaches highlighting deficiencies in order to spur public discussion on
> the matter.
> >
> > It might seem strange to use Python to run Bash to run Java applications
> because that introduces IT industry best of breed’ – the Python dependency
> hell – to the Java application code base. The only strangest decision one
> can made is to use Maven to run Docker to run Bash to run Python to run
> Bash to run Java, but desperate times call for desperate measures I guess.
> >
> > There are Java-based solutions for integration testing exists, e.g.
> Testcontainers [1], Arquillian [2], etc, and they might go well for Ignite
> community CI pipelines by them selves. But we also wanted to run
> performance tests and benchmarks, like the dreaded PME benchmark, and this
> is solved by totally different set of tools in Java world, e.g. Jmeter [3],
> OpenJMH [4], Gatling [5], etc.
> >
> > Speaking specifically about benchmarking, Apache Ignite community
> already has Yardstick [6], and there’s nothing wrong with writing PME
> benchmark using Yardstick, but we also wanted to be able to run scenarios
> like this:
> > - put an X load to a Ignite database;
> > - perform an Y set of operations to check how Ignite copes with
> operations under load.
> >
> > And yes, we also wanted applications under test be deployed ‘like in a
> production’, e.g. distributed over a set of hosts. This arises questions
> about provisioning and nodes affinity which I’ll cover in detail later.
> >
> > So we decided to put a little effort to build a simple tool to cover
> different integration and performance scenarios, and our QA lab first
> attempt was PoC-Tester [7], currently open source for all but for reporting
> web UI. It’s a quite simple to use 95% Java-based tool targeted to be run
> on a pre-release QA stage.
> >
> > It covers production-like deployment and running a scenarios over a
> single database instance. PoC-Tester scenarios consists of a sequence of
> tasks running sequentially or in parallel. After all tasks complete, or at
> any time during test, user can run logs collection task, logs are checked
> against exceptions and a summary of found issues and task ops/latency
> statistics is generated at the end of scenario. One of the main PoC-Tester
> features is its fire-and-forget approach to task managing. That is, you can
> deploy a grid and left it running for weeks, periodically firing some tasks
> onto it.
> >
> > During earliest stages of PoC-Tester development it becomes quite clear
> that Java application development is a tedious process and architecture
> decisions you take during development are slow and hard to change.
> > For example, scenarios like this
> > - deploy two instances of GridGain with master-slave data replication
> configured;
> > - put a load on master;
> > - perform checks on slave,
> > or like this:
> > - preload a 1Tb of data by using your favorite tool of choice to an
> Apache Ignite of version X;
> > - run a set of functional tests running Apache Ignite version Y over
> preloaded data,
> > do not fit well in the PoC-Tester workflow.
> >
> > So, this is why we decided to use Python as a generic scripting language
> of choice.
> >
> > Pros:
> > - quicker prototyping and development cycles
> > - easier to find DevOps/QA engineer with Python skills than one with
> Java skills
> > - used extensively all over the world for DevOps/CI pipelines and thus
> has rich set of libraries for all possible integration uses cases.
> >
> > Cons:
> > - Nightmare with dependencies. Better stick to specific
> language/libraries version.
> >
> > Comparing alternatives for Python-based testing framework we have
> considered following requirements, somewhat similar to what you’ve
> mentioned for Confluent [8] previously:
> > - should be able run locally or distributed (bare metal or in the cloud)
> > - should have built-in deployment facilities for applications under test
> > - should separate test configuration and test code
> > -- be able to easily reconfigure tests by simple configuration changes
> > -- be able to easily scale test environment by simple configuration
> changes
> > -- be able to perform regression testing by simple switching artifacts
> under test via configuration
> > -- be able to run tests with different JDK version by simple
> configuration changes
> > - should have human readable reports and/or reporting tools integration
> > - should allow simple test progress monitoring, one does not want to run
> 6-hours test to find out that application actually crashed during first
> hour.
> > - should allow parallel execution of test actions
> > - should have clean API for test writers
> > -- clean API for distributed remote commands execution
> > -- clean API for deployed applications start / stop and other operations
> > -- clean API for performing check on results
> > - should be open source or at least source code should allow ease change
> or extension
> >
> > Back at that time we found no better alternative than to write our own
> framework, and here goes Tiden [9] as GridGain framework of choice for
> functional integration and performance testing.
> >
> > Pros:
> > - solves all the requirements above
> > Cons (for Ignite):
> > - (currently) closed GridGain source
> >
> > On top of Tiden we’ve built a set of test suites, some of which you
> might have heard already.
> >
> > A Combinator suite allows to run set of operations concurrently over
> given database instance. Proven to find at least 30+ race conditions and
> NPE issues.
> >
> > A Consumption suite allows to run a set production-like actions over
> given set of Ignite/GridGain versions and compare test metrics across
> versions, like heap/disk/CPU consumption, time to perform actions, like
> client PME, server PME, rebalancing time, data replication time, etc.
> >
> > A Yardstick suite is a thin layer of Python glue code to run Apache
> Ignite pre-release benchmarks set. Yardstick itself has a mediocre
> deployment capabilities, Tiden solves this easily.
> >
> > A Stress suite that simulates hardware environment degradation during
> testing.
> >
> > An Ultimate, DR and Compatibility suites that performs functional
> regression testing of GridGain Ultimate Edition features like snapshots,
> security, data replication, rolling upgrades, etc.
> >
> > A Regression and some IEPs testing suites, like IEP-14, IEP-15, etc,
> etc, etc.
> >
> > Most of the suites above use another in-house developed Java tool –
> PiClient – to perform actual loading and miscellaneous operations with
> Ignite under test. We use py4j Python-Java gateway library to control
> PiClient instances from the tests.
> >
> > When we considered CI, we put TeamCity out of scope, because distributed
> integration and performance tests tend to run for hours and TeamCity agents
> are scarce and costly resource. So, bundled with Tiden there is
> jenkins-job-builder [10] based CI pipelines and Jenkins xUnit reporting.
> Also, rich web UI tool Ward aggregates test run reports across versions and
> has built in visualization support for Combinator suite.
> >
> > All of the above is currently closed source, but we plan to make it
> public for community, and publishing Tiden core [9] is the first step on
> that way. You can review some examples of using Tiden for tests at my
> repository [11], for start.
> >
> > Now, let’s compare Ducktape PoC and Tiden.
> >
> > Criteria: Language
> > Tiden: Python, 3.7
> > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 compatible,
> but actually can’t work with Python 3.7 due to broken Zmq dependency.
> > Comment: Python 3.7 has a much better support for async-style code which
> might be crucial for distributed application testing.
> > Score: Tiden: 1, Ducktape: 0
> >
> > Criteria: Test writers API
> > Supported integration test framework concepts are basically the same:
> > - a test controller (test runner)
> > - a cluster
> > - a node
> > - an application (a service in Ducktape terms)
> > - a test
> > Score: Tiden: 5, Ducktape: 5
> >
> > Criteria: Tests selection and run
> > Ducktape: suite-package-class-method level selection, internal scheduler
> allows to run tests in suite in parallel.
> > Tiden: also suite-package-class-method level selection, additionally
> allows selecting subset of tests by attribute, parallel runs not built in,
> but allows merging test reports after different runs.
> > Score: Tiden: 2, Ducktape: 2
> >
> > Criteria: Test configuration
> > Ducktape: single JSON string for all tests
> > Tiden: any number of YaML config files, command line option for
> fine-grained test configuration, ability to select/modify tests behavior
> based on Ignite version.
> > Score: Tiden: 3, Ducktape: 1
> >
> > Criteria: Cluster control
> > Ducktape: allow execute remote commands by node granularity
> > Tiden: additionally can address cluster as a whole and execute remote
> commands in parallel.
> > Score: Tiden: 2, Ducktape: 1
> >
> > Criteria: Logs control
> > Both frameworks have similar builtin support for remote logs collection
> and grepping. Tiden has built-in plugin that can zip, collect arbitrary log
> files from arbitrary locations at test/module/suite granularity and unzip
> if needed, also application API to search / wait for messages in logs.
> Ducktape allows each service declare its log files location (seemingly does
> not support logs rollback), and a single entrypoint to collect service logs.
> > Score: Tiden: 1, Ducktape: 1
> >
> > Criteria: Test assertions
> > Tiden: simple asserts, also few customized assertion helpers.
> > Ducktape: simple asserts.
> > Score: Tiden: 2, Ducktape: 1
> >
> > Criteria: Test reporting
> > Ducktape: limited to its own text/html format
> > Tiden: provides text report, yaml report for reporting tools
> integration, XML xUnit report for integration with Jenkins/TeamCity.
> > Score: Tiden: 3, Ducktape: 1
> >
> > Criteria: Provisioning and deployment
> > Ducktape: can provision subset of hosts from cluster for test needs.
> However, that means, that test can’t be scaled without test code changes.
> Does not do any deploy, relies on external means, e.g. pre-packaged in
> docker image, as in PoC.
> > Tiden: Given a set of hosts, Tiden uses all of them for the test.
> Provisioning should be done by external means. However, provides a
> conventional automated deployment routines.
> > Score: Tiden: 1, Ducktape: 1
> >
> > Criteria: Documentation and Extensibility
> > Tiden: current API documentation is limited, should change as we go open
> source. Tiden is easily extensible via hooks and plugins, see example Maven
> plugin and Gatling application at [11].
> > Ducktape: basic documentation at readthedocs.io. Codebase is rigid,
> framework core is tightly coupled and hard to change. The only possible
> extension mechanism is fork-and-rewrite.
> > Score: Tiden: 2, Ducktape: 1
> >
> > I can continue more on this, but it should be enough for now:
> > Overall score: Tiden: 22, Ducktape: 14.
> >
> > Time for discussion!
> >
> > ---
> > [1] - https://www.testcontainers.org/
> > [2] - http://arquillian.org/guides/getting_started/
> > [3] - https://jmeter.apache.org/index.html
> > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> > [5] - https://gatling.io/docs/current/
> > [6] - https://github.com/gridgain/yardstick
> > [7] - https://github.com/gridgain/poc-tester
> > [8] -
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> > [9] - https://github.com/gridgain/tiden
> > [10] - https://pypi.org/project/jenkins-job-builder/
> > [11] - https://github.com/mshonichev/tiden_examples
> >
> > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >> Hello,
> >>
> >> Branch with duck tape created -
> https://github.com/apache/ignite/tree/ignite-ducktape
> >>
> >> Any who are willing to contribute to PoC are welcome.
> >>
> >>
> >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov <nizhikov....@gmail.com>
> написал(а):
> >>>
> >>> Hello, Denis.
> >>>
> >>> There is no rush with these improvements.
> >>> We can wait for Maxim proposal and compare two solutions :)
> >>>
> >>>> 21 мая 2020 г., в 22:24, Denis Magda <dma...@apache.org> написал(а):
> >>>>
> >>>> Hi Nikolay,
> >>>>
> >>>> Thanks for kicking off this conversation and sharing your findings
> with the
> >>>> results. That's the right initiative. I do agree that Ignite needs to
> have
> >>>> an integration testing framework with capabilities listed by you.
> >>>>
> >>>> As we discussed privately, I would only check if instead of
> >>>> Confluent's Ducktape library, we can use an integration testing
> framework
> >>>> developed by GridGain for testing of Ignite/GridGain clusters. That
> >>>> framework has been battle-tested and might be more convenient for
> >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
> >>>> <mshonic...@gridgain.com> who promised to join this thread once he
> finishes
> >>>> preparing the usage examples of the framework. To my knowledge, Max
> has
> >>>> already been working on that for several days.
> >>>>
> >>>> -
> >>>> Denis
> >>>>
> >>>>
> >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <nizhi...@apache.org
> >
> >>>> wrote:
> >>>>
> >>>>> Hello, Igniters.
> >>>>>
> >>>>> I created a PoC [1] for the integration tests of Ignite.
> >>>>>
> >>>>> Let me briefly explain the gap I want to cover:
> >>>>>
> >>>>> 1. For now, we don’t have a solution for automated testing of Ignite
> on
> >>>>> «real cluster».
> >>>>> By «real cluster» I mean cluster «like a production»:
> >>>>>       * client and server nodes deployed on different hosts.
> >>>>>       * thin clients perform queries from some other hosts
> >>>>>       * etc.
> >>>>>
> >>>>> 2. We don’t have a solution for automated benchmarks of some internal
> >>>>> Ignite process
> >>>>>       * PME
> >>>>>       * rebalance.
> >>>>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0
> faster
> >>>>> or slower than in 2.8.0 for the same cluster?
> >>>>>
> >>>>> 3. We don’t have a solution for automated testing of Ignite
> integration in
> >>>>> a real-world environment:
> >>>>> Ignite-Spark integration can be taken as an example.
> >>>>> I think some ML solutions also should be tested in real-world
> deployments.
> >>>>>
> >>>>> Solution:
> >>>>>
> >>>>> I propose to use duck tape library from confluent (apache 2.0
> license)
> >>>>> I tested it both on the real cluster(Yandex Cloud) and on the local
> >>>>> environment(docker) and it works just fine.
> >>>>>
> >>>>> PoC contains following services:
> >>>>>
> >>>>>       * Simple rebalance test:
> >>>>>               Start 2 server nodes,
> >>>>>               Create some data with Ignite client,
> >>>>>               Start one more server node,
> >>>>>               Wait for rebalance finish
> >>>>>       * Simple Ignite-Spark integration test:
> >>>>>               Start 1 Spark master, start 1 Spark worker,
> >>>>>               Start 1 Ignite server node
> >>>>>               Create some data with Ignite client,
> >>>>>               Check data in application that queries it from Spark.
> >>>>>
> >>>>> All tests are fully automated.
> >>>>> Logs collection works just fine.
> >>>>> You can see an example of the tests report - [4].
> >>>>>
> >>>>> Pros:
> >>>>>
> >>>>> * Ability to test local changes(no need to public changes to some
> remote
> >>>>> repository or similar).
> >>>>> * Ability to parametrize test environment(run the same tests on
> different
> >>>>> JDK, JVM params, config, etc.)
> >>>>> * Isolation by default so system tests are as reliable as possible.
> >>>>> * Utilities for pulling up and tearing down services easily in
> clusters in
> >>>>> different environments (e.g. local, custom cluster, Vagrant, K8s,
> Mesos,
> >>>>> Docker, cloud providers, etc.)
> >>>>> * Easy to write unit tests for distributed systems
> >>>>> * Adopted and successfully used by other distributed open source
> project -
> >>>>> Apache Kafka.
> >>>>> * Collect results (e.g. logs, console output)
> >>>>> * Report results (e.g. expected conditions met, performance results,
> etc.)
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>>>> [2] https://github.com/confluentinc/ducktape
> >>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>
>

Re: [DISCUSSION] Ignite integration testing framework.

Reply via email to