Re: [DISCUSSION] Ignite integration testing framework.

Max A. Shonichev Tue, 09 Jun 2020 07:27:10 -0700

Greetings, Nikolay,

First of all, thank you for you great effort preparing PoC ofintegration testing to Ignite community.

It’s a shame Ignite did not have at least some such tests yet, however,GridGain, as a major contributor to Apache Ignite had a profoundcollection of in-house tools to perform integration and performancetesting for years already and while we slowly consider sharing ourexpertise with the community, your initiative makes us drive thatprocess a bit faster, thanks a lot!

I reviewed your PoC and want to share a little about what we do on ourpart, why and how, hope it would help community take proper course.

First I’ll do a brief overview of what decisions we made and what we dohave in our private code base, next I’ll describe what we have alreadydonated to the public and what we plan public next, then I’ll compareboth approaches highlighting deficiencies in order to spur publicdiscussion on the matter.

It might seem strange to use Python to run Bash to run Java applicationsbecause that introduces IT industry best of breed’ – the Pythondependency hell – to the Java application code base. The only strangestdecision one can made is to use Maven to run Docker to run Bash to runPython to run Bash to run Java, but desperate times call for desperatemeasures I guess.

There are Java-based solutions for integration testing exists, e.g.Testcontainers [1], Arquillian [2], etc, and they might go well forIgnite community CI pipelines by them selves. But we also wanted to runperformance tests and benchmarks, like the dreaded PME benchmark, andthis is solved by totally different set of tools in Java world, e.g.Jmeter [3], OpenJMH [4], Gatling [5], etc.

Speaking specifically about benchmarking, Apache Ignite communityalready has Yardstick [6], and there’s nothing wrong with writing PMEbenchmark using Yardstick, but we also wanted to be able to runscenarios like this:

- put an X load to a Ignite database;

- perform an Y set of operations to check how Ignite copes withoperations under load.

And yes, we also wanted applications under test be deployed ‘like in aproduction’, e.g. distributed over a set of hosts. This arises questionsabout provisioning and nodes affinity which I’ll cover in detail later.

So we decided to put a little effort to build a simple tool to coverdifferent integration and performance scenarios, and our QA lab firstattempt was PoC-Tester [7], currently open source for all but forreporting web UI. It’s a quite simple to use 95% Java-based tooltargeted to be run on a pre-release QA stage.

It covers production-like deployment and running a scenarios over asingle database instance. PoC-Tester scenarios consists of a sequence oftasks running sequentially or in parallel. After all tasks complete, orat any time during test, user can run logs collection task, logs arechecked against exceptions and a summary of found issues and taskops/latency statistics is generated at the end of scenario. One of themain PoC-Tester features is its fire-and-forget approach to taskmanaging. That is, you can deploy a grid and left it running for weeks,periodically firing some tasks onto it.

During earliest stages of PoC-Tester development it becomes quite clearthat Java application development is a tedious process and architecturedecisions you take during development are slow and hard to change.

For example, scenarios like this

- deploy two instances of GridGain with master-slave data replicationconfigured;

- put a load on master;
- perform checks on slave,
or like this:

- preload a 1Tb of data by using your favorite tool of choice to anApache Ignite of version X;- run a set of functional tests running Apache Ignite version Y overpreloaded data,

do not fit well in the PoC-Tester workflow.

So, this is why we decided to use Python as a generic scripting languageof choice.


Pros:
- quicker prototyping and development cycles

- easier to find DevOps/QA engineer with Python skills than one withJava skills- used extensively all over the world for DevOps/CI pipelines and thushas rich set of libraries for all possible integration uses cases.


Cons:

- Nightmare with dependencies. Better stick to specificlanguage/libraries version.

Comparing alternatives for Python-based testing framework we haveconsidered following requirements, somewhat similar to what you’vementioned for Confluent [8] previously:

- should be able run locally or distributed (bare metal or in the cloud)
- should have built-in deployment facilities for applications under test
- should separate test configuration and test code
-- be able to easily reconfigure tests by simple configuration changes
-- be able to easily scale test environment by simple configuration changes

-- be able to perform regression testing by simple switching artifactsunder test via configuration-- be able to run tests with different JDK version by simpleconfiguration changes

- should have human readable reports and/or reporting tools integration

- should allow simple test progress monitoring, one does not want to run6-hours test to find out that application actually crashed during firsthour.

- should allow parallel execution of test actions
- should have clean API for test writers
-- clean API for distributed remote commands execution
-- clean API for deployed applications start / stop and other operations
-- clean API for performing check on results

- should be open source or at least source code should allow ease changeor extension

Back at that time we found no better alternative than to write our ownframework, and here goes Tiden [9] as GridGain framework of choice forfunctional integration and performance testing.


Pros:
- solves all the requirements above
Cons (for Ignite):
- (currently) closed GridGain source

On top of Tiden we’ve built a set of test suites, some of which youmight have heard already.

A Combinator suite allows to run set of operations concurrently overgiven database instance. Proven to find at least 30+ race conditions andNPE issues.

A Consumption suite allows to run a set production-like actions overgiven set of Ignite/GridGain versions and compare test metrics acrossversions, like heap/disk/CPU consumption, time to perform actions, likeclient PME, server PME, rebalancing time, data replication time, etc.

A Yardstick suite is a thin layer of Python glue code to run ApacheIgnite pre-release benchmarks set. Yardstick itself has a mediocredeployment capabilities, Tiden solves this easily.

A Stress suite that simulates hardware environment degradation duringtesting.

An Ultimate, DR and Compatibility suites that performs functionalregression testing of GridGain Ultimate Edition features like snapshots,security, data replication, rolling upgrades, etc.

A Regression and some IEPs testing suites, like IEP-14, IEP-15, etc,etc, etc.

Most of the suites above use another in-house developed Java tool –PiClient – to perform actual loading and miscellaneous operations withIgnite under test. We use py4j Python-Java gateway library to controlPiClient instances from the tests.

When we considered CI, we put TeamCity out of scope, because distributedintegration and performance tests tend to run for hours and TeamCityagents are scarce and costly resource. So, bundled with Tiden there isjenkins-job-builder [10] based CI pipelines and Jenkins xUnit reporting.Also, rich web UI tool Ward aggregates test run reports across versionsand has built in visualization support for Combinator suite.

All of the above is currently closed source, but we plan to make itpublic for community, and publishing Tiden core [9] is the first step onthat way. You can review some examples of using Tiden for tests at myrepository [11], for start.


Now, let’s compare Ducktape PoC and Tiden.

Criteria: Language
Tiden: Python, 3.7

Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 compatible,but actually can’t work with Python 3.7 due to broken Zmq dependency.Comment: Python 3.7 has a much better support for async-style code whichmight be crucial for distributed application testing.

Score: Tiden: 1, Ducktape: 0

Criteria: Test writers API
Supported integration test framework concepts are basically the same:
- a test controller (test runner)
- a cluster
- a node
- an application (a service in Ducktape terms)
- a test
Score: Tiden: 5, Ducktape: 5

Criteria: Tests selection and run

Ducktape: suite-package-class-method level selection, internal schedulerallows to run tests in suite in parallel.Tiden: also suite-package-class-method level selection, additionallyallows selecting subset of tests by attribute, parallel runs not builtin, but allows merging test reports after different runs.

Score: Tiden: 2, Ducktape: 2

Criteria: Test configuration
Ducktape: single JSON string for all tests

Tiden: any number of YaML config files, command line option forfine-grained test configuration, ability to select/modify tests behaviorbased on Ignite version.

Score: Tiden: 3, Ducktape: 1

Criteria: Cluster control
Ducktape: allow execute remote commands by node granularity

Tiden: additionally can address cluster as a whole and execute remotecommands in parallel.

Score: Tiden: 2, Ducktape: 1

Criteria: Logs control

Both frameworks have similar builtin support for remote logs collectionand grepping. Tiden has built-in plugin that can zip, collect arbitrarylog files from arbitrary locations at test/module/suite granularity andunzip if needed, also application API to search / wait for messages inlogs. Ducktape allows each service declare its log files location(seemingly does not support logs rollback), and a single entrypoint tocollect service logs.

Score: Tiden: 1, Ducktape: 1

Criteria: Test assertions
Tiden: simple asserts, also few customized assertion helpers.
Ducktape: simple asserts.
Score: Tiden: 2, Ducktape: 1

Criteria: Test reporting
Ducktape: limited to its own text/html format

Tiden: provides text report, yaml report for reporting toolsintegration, XML xUnit report for integration with Jenkins/TeamCity.

Score: Tiden: 3, Ducktape: 1

Criteria: Provisioning and deployment

Ducktape: can provision subset of hosts from cluster for test needs.However, that means, that test can’t be scaled without test codechanges. Does not do any deploy, relies on external means, e.g.pre-packaged in docker image, as in PoC.Tiden: Given a set of hosts, Tiden uses all of them for the test.Provisioning should be done by external means. However, provides aconventional automated deployment routines.

Score: Tiden: 1, Ducktape: 1

Criteria: Documentation and Extensibility

Tiden: current API documentation is limited, should change as we go opensource. Tiden is easily extensible via hooks and plugins, see exampleMaven plugin and Gatling application at [11].Ducktape: basic documentation at readthedocs.io. Codebase is rigid,framework core is tightly coupled and hard to change. The only possibleextension mechanism is fork-and-rewrite.

Score: Tiden: 2, Ducktape: 1

I can continue more on this, but it should be enough for now:
Overall score: Tiden: 22, Ducktape: 14.

Time for discussion!

---
[1] - https://www.testcontainers.org/
[2] - http://arquillian.org/guides/getting_started/
[3] - https://jmeter.apache.org/index.html
[4] - https://openjdk.java.net/projects/code-tools/jmh/
[5] - https://gatling.io/docs/current/
[6] - https://github.com/gridgain/yardstick
[7] - https://github.com/gridgain/poc-tester

[8] -https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements

[9] - https://github.com/gridgain/tiden
[10] - https://pypi.org/project/jenkins-job-builder/
[11] - https://github.com/mshonichev/tiden_examples

On 25.05.2020 11:09, Nikolay Izhikov wrote:

Hello,

Branch with duck tape created - 
https://github.com/apache/ignite/tree/ignite-ducktape

Any who are willing to contribute to PoC are welcome.

21 мая 2020 г., в 22:33, Nikolay Izhikov <[email protected]> написал(а):

Hello, Denis.

There is no rush with these improvements.
We can wait for Maxim proposal and compare two solutions :)

21 мая 2020 г., в 22:24, Denis Magda <[email protected]> написал(а):

Hi Nikolay,

Thanks for kicking off this conversation and sharing your findings with the
results. That's the right initiative. I do agree that Ignite needs to have
an integration testing framework with capabilities listed by you.

As we discussed privately, I would only check if instead of
Confluent's Ducktape library, we can use an integration testing framework
developed by GridGain for testing of Ignite/GridGain clusters. That
framework has been battle-tested and might be more convenient for
Ignite-specific workloads. Let's wait for @Maksim Shonichev
<[email protected]> who promised to join this thread once he finishes
preparing the usage examples of the framework. To my knowledge, Max has
already been working on that for several days.

-
Denis


On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <[email protected]>
wrote:

Hello, Igniters.

I created a PoC [1] for the integration tests of Ignite.

Let me briefly explain the gap I want to cover:

1. For now, we don’t have a solution for automated testing of Ignite on
«real cluster».
By «real cluster» I mean cluster «like a production»:
       * client and server nodes deployed on different hosts.
       * thin clients perform queries from some other hosts
       * etc.

2. We don’t have a solution for automated benchmarks of some internal
Ignite process
       * PME
       * rebalance.
This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
or slower than in 2.8.0 for the same cluster?

3. We don’t have a solution for automated testing of Ignite integration in
a real-world environment:
Ignite-Spark integration can be taken as an example.
I think some ML solutions also should be tested in real-world deployments.

Solution:

I propose to use duck tape library from confluent (apache 2.0 license)
I tested it both on the real cluster(Yandex Cloud) and on the local
environment(docker) and it works just fine.

PoC contains following services:

       * Simple rebalance test:
               Start 2 server nodes,
               Create some data with Ignite client,
               Start one more server node,
               Wait for rebalance finish
       * Simple Ignite-Spark integration test:
               Start 1 Spark master, start 1 Spark worker,
               Start 1 Ignite server node
               Create some data with Ignite client,
               Check data in application that queries it from Spark.

All tests are fully automated.
Logs collection works just fine.
You can see an example of the tests report - [4].

Pros:

* Ability to test local changes(no need to public changes to some remote
repository or similar).
* Ability to parametrize test environment(run the same tests on different
JDK, JVM params, config, etc.)
* Isolation by default so system tests are as reliable as possible.
* Utilities for pulling up and tearing down services easily in clusters in
different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
Docker, cloud providers, etc.)
* Easy to write unit tests for distributed systems
* Adopted and successfully used by other distributed open source project -
Apache Kafka.
* Collect results (e.g. logs, console output)
* Report results (e.g. expected conditions met, performance results, etc.)

WDYT?

[1] https://github.com/nizhikov/ignite/pull/15
[2] https://github.com/confluentinc/ducktape
[3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
[4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Reply via email to