Folks, First, I've created PR [1] with ducktests improvements PR contains the following changes - Pme-free switch proof-benchmark (2.7.6 vs master) - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8) - Global refactoring -- benchmarks javacode simplification -- services python and java classes code deduplication -- fail-fast checks for java and python (eg. application should explicitly write it finished with success) -- simple results extraction from tests and benchmarks -- javacode now configurable from tests/benchmarks -- proper SIGTERM handling at javacode (eg. it may finish last operation and log results) -- docker volume now marked as delegated to increase execution speed for mac & win users -- Ignite cluster now start in parallel (start speed-up) -- Ignite can be configured at test/benchmark - full and module assembly scripts added
Second, I'd like to propose to accept ducktests [2] (ducktape integration) as a target "PoC check & real topology benchmarking tool". Ducktape pros - Developed for distributed system by distributed system developers. - Developed since 2014, stable. - Proven usability by usage at Kafka. - Dozens of dozens tests and benchmarks at Kafka as a great example pack. - Built-in Docker support for rapid development and checks. - Great for CI automation. As an additional motivation, at least 3 teams - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix speed-up)) - Ignite SE Plugins team (to check plugin's features does not slow-down or broke AI features) - Ignite SE QA team (to append already developed smoke/load/failover tests to AI codebase) now, wait for ducktest merge to start checking cases they working on in AI way. Thoughts? [1] https://github.com/apache/ignite/pull/7967 [2] https://github.com/apache/ignite/tree/ignite-ducktape On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhi...@apache.org> wrote: > Hello, Maxim. > > Thank you for so detailed explanation. > > Can we put the content of this discussion somewhere on the wiki? > So It doesn’t get lost. > > I divide the answer in several parts. From the requirements to the > implementation. > So, if we agreed on the requirements we can proceed with the discussion of > the implementation. > > 1. Requirements: > > The main goal I want to achieve is *reproducibility* of the tests. > I’m sick and tired with the zillions of flaky, rarely failed, and almost > never failed tests in Ignite codebase. > We should start with the simplest scenarios that will be as reliable as > steel :) > > I want to know for sure: > - Is this PR makes rebalance quicker or not? > - Is this PR makes PME quicker or not? > > So, your description of the complex test scenario looks as a next step to > me. > > Anyway, It’s cool we already have one. > > The second goal is to have a strict test lifecycle as we have in JUnit and > similar frameworks. > > > It covers production-like deployment and running a scenarios over a > single database instance. > > Do you mean «single cluster» or «single host»? > > 2. Existing tests: > > > A Combinator suite allows to run set of operations concurrently over > given database instance. > > A Consumption suite allows to run a set production-like actions over > given set of Ignite/GridGain versions and compare test metrics across > versions > > A Yardstick suite > > A Stress suite that simulates hardware environment degradation > > An Ultimate, DR and Compatibility suites that performs functional > regression testing > > Regression > > Great news that we already have so many choices for testing! > Mature test base is a big +1 for Tiden. > > 3. Comparison: > > > Criteria: Test configuration > > Ducktape: single JSON string for all tests > > Tiden: any number of YaML config files, command line option for > fine-grained test configuration, ability to select/modify tests behavior > based on Ignite version. > > 1. Many YAML files can be hard to maintain. > 2. In ducktape, you can set parameters via «—parameters» option. Please, > take a look at the doc [1] > > > Criteria: Cluster control > > Tiden: additionally can address cluster as a whole and execute remote > commands in parallel. > > It seems we implement this ability in the PoC, already. > > > Criteria: Test assertions > > Tiden: simple asserts, also few customized assertion helpers. > > Ducktape: simple asserts. > > Can you, please, be more specific. > What helpers do you have in mind? > Ducktape has an asserts that waits for logfile messages or some process > finish. > > > Criteria: Test reporting > > Ducktape: limited to its own text/HTML format > > Ducktape have > 1. Text reporter > 2. Customizable HTML reporter > 3. JSON reporter. > > We can show JSON with the any template or tool. > > > Criteria: Provisioning and deployment > > Ducktape: can provision subset of hosts from cluster for test needs. > However, that means, that test can’t be scaled without test code changes. > Does not do any deploy, relies on external means, e.g. pre-packaged in > docker image, as in PoC. > > This is not true. > > 1. We can set explicit test parameters(node number) via parameters. > We can increase client count of cluster size without test code changes. > > 2. We have many choices for the test environment. These choices are tested > and used in other projects: > * docker > * vagrant > * private cloud(ssh access) > * ec2 > Please, take a look at Kafka documentation [2] > > > I can continue more on this, but it should be enough for now: > > We need to go deeper! :) > > [1] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options > [2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart > > > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshon...@yandex.ru> > написал(а): > > > > Greetings, Nikolay, > > > > First of all, thank you for you great effort preparing PoC of > integration testing to Ignite community. > > > > It’s a shame Ignite did not have at least some such tests yet, however, > GridGain, as a major contributor to Apache Ignite had a profound collection > of in-house tools to perform integration and performance testing for years > already and while we slowly consider sharing our expertise with the > community, your initiative makes us drive that process a bit faster, thanks > a lot! > > > > I reviewed your PoC and want to share a little about what we do on our > part, why and how, hope it would help community take proper course. > > > > First I’ll do a brief overview of what decisions we made and what we do > have in our private code base, next I’ll describe what we have already > donated to the public and what we plan public next, then I’ll compare both > approaches highlighting deficiencies in order to spur public discussion on > the matter. > > > > It might seem strange to use Python to run Bash to run Java applications > because that introduces IT industry best of breed’ – the Python dependency > hell – to the Java application code base. The only strangest decision one > can made is to use Maven to run Docker to run Bash to run Python to run > Bash to run Java, but desperate times call for desperate measures I guess. > > > > There are Java-based solutions for integration testing exists, e.g. > Testcontainers [1], Arquillian [2], etc, and they might go well for Ignite > community CI pipelines by them selves. But we also wanted to run > performance tests and benchmarks, like the dreaded PME benchmark, and this > is solved by totally different set of tools in Java world, e.g. Jmeter [3], > OpenJMH [4], Gatling [5], etc. > > > > Speaking specifically about benchmarking, Apache Ignite community > already has Yardstick [6], and there’s nothing wrong with writing PME > benchmark using Yardstick, but we also wanted to be able to run scenarios > like this: > > - put an X load to a Ignite database; > > - perform an Y set of operations to check how Ignite copes with > operations under load. > > > > And yes, we also wanted applications under test be deployed ‘like in a > production’, e.g. distributed over a set of hosts. This arises questions > about provisioning and nodes affinity which I’ll cover in detail later. > > > > So we decided to put a little effort to build a simple tool to cover > different integration and performance scenarios, and our QA lab first > attempt was PoC-Tester [7], currently open source for all but for reporting > web UI. It’s a quite simple to use 95% Java-based tool targeted to be run > on a pre-release QA stage. > > > > It covers production-like deployment and running a scenarios over a > single database instance. PoC-Tester scenarios consists of a sequence of > tasks running sequentially or in parallel. After all tasks complete, or at > any time during test, user can run logs collection task, logs are checked > against exceptions and a summary of found issues and task ops/latency > statistics is generated at the end of scenario. One of the main PoC-Tester > features is its fire-and-forget approach to task managing. That is, you can > deploy a grid and left it running for weeks, periodically firing some tasks > onto it. > > > > During earliest stages of PoC-Tester development it becomes quite clear > that Java application development is a tedious process and architecture > decisions you take during development are slow and hard to change. > > For example, scenarios like this > > - deploy two instances of GridGain with master-slave data replication > configured; > > - put a load on master; > > - perform checks on slave, > > or like this: > > - preload a 1Tb of data by using your favorite tool of choice to an > Apache Ignite of version X; > > - run a set of functional tests running Apache Ignite version Y over > preloaded data, > > do not fit well in the PoC-Tester workflow. > > > > So, this is why we decided to use Python as a generic scripting language > of choice. > > > > Pros: > > - quicker prototyping and development cycles > > - easier to find DevOps/QA engineer with Python skills than one with > Java skills > > - used extensively all over the world for DevOps/CI pipelines and thus > has rich set of libraries for all possible integration uses cases. > > > > Cons: > > - Nightmare with dependencies. Better stick to specific > language/libraries version. > > > > Comparing alternatives for Python-based testing framework we have > considered following requirements, somewhat similar to what you’ve > mentioned for Confluent [8] previously: > > - should be able run locally or distributed (bare metal or in the cloud) > > - should have built-in deployment facilities for applications under test > > - should separate test configuration and test code > > -- be able to easily reconfigure tests by simple configuration changes > > -- be able to easily scale test environment by simple configuration > changes > > -- be able to perform regression testing by simple switching artifacts > under test via configuration > > -- be able to run tests with different JDK version by simple > configuration changes > > - should have human readable reports and/or reporting tools integration > > - should allow simple test progress monitoring, one does not want to run > 6-hours test to find out that application actually crashed during first > hour. > > - should allow parallel execution of test actions > > - should have clean API for test writers > > -- clean API for distributed remote commands execution > > -- clean API for deployed applications start / stop and other operations > > -- clean API for performing check on results > > - should be open source or at least source code should allow ease change > or extension > > > > Back at that time we found no better alternative than to write our own > framework, and here goes Tiden [9] as GridGain framework of choice for > functional integration and performance testing. > > > > Pros: > > - solves all the requirements above > > Cons (for Ignite): > > - (currently) closed GridGain source > > > > On top of Tiden we’ve built a set of test suites, some of which you > might have heard already. > > > > A Combinator suite allows to run set of operations concurrently over > given database instance. Proven to find at least 30+ race conditions and > NPE issues. > > > > A Consumption suite allows to run a set production-like actions over > given set of Ignite/GridGain versions and compare test metrics across > versions, like heap/disk/CPU consumption, time to perform actions, like > client PME, server PME, rebalancing time, data replication time, etc. > > > > A Yardstick suite is a thin layer of Python glue code to run Apache > Ignite pre-release benchmarks set. Yardstick itself has a mediocre > deployment capabilities, Tiden solves this easily. > > > > A Stress suite that simulates hardware environment degradation during > testing. > > > > An Ultimate, DR and Compatibility suites that performs functional > regression testing of GridGain Ultimate Edition features like snapshots, > security, data replication, rolling upgrades, etc. > > > > A Regression and some IEPs testing suites, like IEP-14, IEP-15, etc, > etc, etc. > > > > Most of the suites above use another in-house developed Java tool – > PiClient – to perform actual loading and miscellaneous operations with > Ignite under test. We use py4j Python-Java gateway library to control > PiClient instances from the tests. > > > > When we considered CI, we put TeamCity out of scope, because distributed > integration and performance tests tend to run for hours and TeamCity agents > are scarce and costly resource. So, bundled with Tiden there is > jenkins-job-builder [10] based CI pipelines and Jenkins xUnit reporting. > Also, rich web UI tool Ward aggregates test run reports across versions and > has built in visualization support for Combinator suite. > > > > All of the above is currently closed source, but we plan to make it > public for community, and publishing Tiden core [9] is the first step on > that way. You can review some examples of using Tiden for tests at my > repository [11], for start. > > > > Now, let’s compare Ducktape PoC and Tiden. > > > > Criteria: Language > > Tiden: Python, 3.7 > > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 compatible, > but actually can’t work with Python 3.7 due to broken Zmq dependency. > > Comment: Python 3.7 has a much better support for async-style code which > might be crucial for distributed application testing. > > Score: Tiden: 1, Ducktape: 0 > > > > Criteria: Test writers API > > Supported integration test framework concepts are basically the same: > > - a test controller (test runner) > > - a cluster > > - a node > > - an application (a service in Ducktape terms) > > - a test > > Score: Tiden: 5, Ducktape: 5 > > > > Criteria: Tests selection and run > > Ducktape: suite-package-class-method level selection, internal scheduler > allows to run tests in suite in parallel. > > Tiden: also suite-package-class-method level selection, additionally > allows selecting subset of tests by attribute, parallel runs not built in, > but allows merging test reports after different runs. > > Score: Tiden: 2, Ducktape: 2 > > > > Criteria: Test configuration > > Ducktape: single JSON string for all tests > > Tiden: any number of YaML config files, command line option for > fine-grained test configuration, ability to select/modify tests behavior > based on Ignite version. > > Score: Tiden: 3, Ducktape: 1 > > > > Criteria: Cluster control > > Ducktape: allow execute remote commands by node granularity > > Tiden: additionally can address cluster as a whole and execute remote > commands in parallel. > > Score: Tiden: 2, Ducktape: 1 > > > > Criteria: Logs control > > Both frameworks have similar builtin support for remote logs collection > and grepping. Tiden has built-in plugin that can zip, collect arbitrary log > files from arbitrary locations at test/module/suite granularity and unzip > if needed, also application API to search / wait for messages in logs. > Ducktape allows each service declare its log files location (seemingly does > not support logs rollback), and a single entrypoint to collect service logs. > > Score: Tiden: 1, Ducktape: 1 > > > > Criteria: Test assertions > > Tiden: simple asserts, also few customized assertion helpers. > > Ducktape: simple asserts. > > Score: Tiden: 2, Ducktape: 1 > > > > Criteria: Test reporting > > Ducktape: limited to its own text/html format > > Tiden: provides text report, yaml report for reporting tools > integration, XML xUnit report for integration with Jenkins/TeamCity. > > Score: Tiden: 3, Ducktape: 1 > > > > Criteria: Provisioning and deployment > > Ducktape: can provision subset of hosts from cluster for test needs. > However, that means, that test can’t be scaled without test code changes. > Does not do any deploy, relies on external means, e.g. pre-packaged in > docker image, as in PoC. > > Tiden: Given a set of hosts, Tiden uses all of them for the test. > Provisioning should be done by external means. However, provides a > conventional automated deployment routines. > > Score: Tiden: 1, Ducktape: 1 > > > > Criteria: Documentation and Extensibility > > Tiden: current API documentation is limited, should change as we go open > source. Tiden is easily extensible via hooks and plugins, see example Maven > plugin and Gatling application at [11]. > > Ducktape: basic documentation at readthedocs.io. Codebase is rigid, > framework core is tightly coupled and hard to change. The only possible > extension mechanism is fork-and-rewrite. > > Score: Tiden: 2, Ducktape: 1 > > > > I can continue more on this, but it should be enough for now: > > Overall score: Tiden: 22, Ducktape: 14. > > > > Time for discussion! > > > > --- > > [1] - https://www.testcontainers.org/ > > [2] - http://arquillian.org/guides/getting_started/ > > [3] - https://jmeter.apache.org/index.html > > [4] - https://openjdk.java.net/projects/code-tools/jmh/ > > [5] - https://gatling.io/docs/current/ > > [6] - https://github.com/gridgain/yardstick > > [7] - https://github.com/gridgain/poc-tester > > [8] - > https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements > > [9] - https://github.com/gridgain/tiden > > [10] - https://pypi.org/project/jenkins-job-builder/ > > [11] - https://github.com/mshonichev/tiden_examples > > > > On 25.05.2020 11:09, Nikolay Izhikov wrote: > >> Hello, > >> > >> Branch with duck tape created - > https://github.com/apache/ignite/tree/ignite-ducktape > >> > >> Any who are willing to contribute to PoC are welcome. > >> > >> > >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov <nizhikov....@gmail.com> > написал(а): > >>> > >>> Hello, Denis. > >>> > >>> There is no rush with these improvements. > >>> We can wait for Maxim proposal and compare two solutions :) > >>> > >>>> 21 мая 2020 г., в 22:24, Denis Magda <dma...@apache.org> написал(а): > >>>> > >>>> Hi Nikolay, > >>>> > >>>> Thanks for kicking off this conversation and sharing your findings > with the > >>>> results. That's the right initiative. I do agree that Ignite needs to > have > >>>> an integration testing framework with capabilities listed by you. > >>>> > >>>> As we discussed privately, I would only check if instead of > >>>> Confluent's Ducktape library, we can use an integration testing > framework > >>>> developed by GridGain for testing of Ignite/GridGain clusters. That > >>>> framework has been battle-tested and might be more convenient for > >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev > >>>> <mshonic...@gridgain.com> who promised to join this thread once he > finishes > >>>> preparing the usage examples of the framework. To my knowledge, Max > has > >>>> already been working on that for several days. > >>>> > >>>> - > >>>> Denis > >>>> > >>>> > >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <nizhi...@apache.org > > > >>>> wrote: > >>>> > >>>>> Hello, Igniters. > >>>>> > >>>>> I created a PoC [1] for the integration tests of Ignite. > >>>>> > >>>>> Let me briefly explain the gap I want to cover: > >>>>> > >>>>> 1. For now, we don’t have a solution for automated testing of Ignite > on > >>>>> «real cluster». > >>>>> By «real cluster» I mean cluster «like a production»: > >>>>> * client and server nodes deployed on different hosts. > >>>>> * thin clients perform queries from some other hosts > >>>>> * etc. > >>>>> > >>>>> 2. We don’t have a solution for automated benchmarks of some internal > >>>>> Ignite process > >>>>> * PME > >>>>> * rebalance. > >>>>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 > faster > >>>>> or slower than in 2.8.0 for the same cluster? > >>>>> > >>>>> 3. We don’t have a solution for automated testing of Ignite > integration in > >>>>> a real-world environment: > >>>>> Ignite-Spark integration can be taken as an example. > >>>>> I think some ML solutions also should be tested in real-world > deployments. > >>>>> > >>>>> Solution: > >>>>> > >>>>> I propose to use duck tape library from confluent (apache 2.0 > license) > >>>>> I tested it both on the real cluster(Yandex Cloud) and on the local > >>>>> environment(docker) and it works just fine. > >>>>> > >>>>> PoC contains following services: > >>>>> > >>>>> * Simple rebalance test: > >>>>> Start 2 server nodes, > >>>>> Create some data with Ignite client, > >>>>> Start one more server node, > >>>>> Wait for rebalance finish > >>>>> * Simple Ignite-Spark integration test: > >>>>> Start 1 Spark master, start 1 Spark worker, > >>>>> Start 1 Ignite server node > >>>>> Create some data with Ignite client, > >>>>> Check data in application that queries it from Spark. > >>>>> > >>>>> All tests are fully automated. > >>>>> Logs collection works just fine. > >>>>> You can see an example of the tests report - [4]. > >>>>> > >>>>> Pros: > >>>>> > >>>>> * Ability to test local changes(no need to public changes to some > remote > >>>>> repository or similar). > >>>>> * Ability to parametrize test environment(run the same tests on > different > >>>>> JDK, JVM params, config, etc.) > >>>>> * Isolation by default so system tests are as reliable as possible. > >>>>> * Utilities for pulling up and tearing down services easily in > clusters in > >>>>> different environments (e.g. local, custom cluster, Vagrant, K8s, > Mesos, > >>>>> Docker, cloud providers, etc.) > >>>>> * Easy to write unit tests for distributed systems > >>>>> * Adopted and successfully used by other distributed open source > project - > >>>>> Apache Kafka. > >>>>> * Collect results (e.g. logs, console output) > >>>>> * Report results (e.g. expected conditions met, performance results, > etc.) > >>>>> > >>>>> WDYT? > >>>>> > >>>>> [1] https://github.com/nizhikov/ignite/pull/15 > >>>>> [2] https://github.com/confluentinc/ducktape > >>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html > >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg > >