Hi Nikolay, Thanks for kicking off this conversation and sharing your findings with the results. That's the right initiative. I do agree that Ignite needs to have an integration testing framework with capabilities listed by you.
As we discussed privately, I would only check if instead of Confluent's Ducktape library, we can use an integration testing framework developed by GridGain for testing of Ignite/GridGain clusters. That framework has been battle-tested and might be more convenient for Ignite-specific workloads. Let's wait for @Maksim Shonichev <mshonic...@gridgain.com> who promised to join this thread once he finishes preparing the usage examples of the framework. To my knowledge, Max has already been working on that for several days. - Denis On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <nizhi...@apache.org> wrote: > Hello, Igniters. > > I created a PoC [1] for the integration tests of Ignite. > > Let me briefly explain the gap I want to cover: > > 1. For now, we don’t have a solution for automated testing of Ignite on > «real cluster». > By «real cluster» I mean cluster «like a production»: > * client and server nodes deployed on different hosts. > * thin clients perform queries from some other hosts > * etc. > > 2. We don’t have a solution for automated benchmarks of some internal > Ignite process > * PME > * rebalance. > This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster > or slower than in 2.8.0 for the same cluster? > > 3. We don’t have a solution for automated testing of Ignite integration in > a real-world environment: > Ignite-Spark integration can be taken as an example. > I think some ML solutions also should be tested in real-world deployments. > > Solution: > > I propose to use duck tape library from confluent (apache 2.0 license) > I tested it both on the real cluster(Yandex Cloud) and on the local > environment(docker) and it works just fine. > > PoC contains following services: > > * Simple rebalance test: > Start 2 server nodes, > Create some data with Ignite client, > Start one more server node, > Wait for rebalance finish > * Simple Ignite-Spark integration test: > Start 1 Spark master, start 1 Spark worker, > Start 1 Ignite server node > Create some data with Ignite client, > Check data in application that queries it from Spark. > > All tests are fully automated. > Logs collection works just fine. > You can see an example of the tests report - [4]. > > Pros: > > * Ability to test local changes(no need to public changes to some remote > repository or similar). > * Ability to parametrize test environment(run the same tests on different > JDK, JVM params, config, etc.) > * Isolation by default so system tests are as reliable as possible. > * Utilities for pulling up and tearing down services easily in clusters in > different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos, > Docker, cloud providers, etc.) > * Easy to write unit tests for distributed systems > * Adopted and successfully used by other distributed open source project - > Apache Kafka. > * Collect results (e.g. logs, console output) > * Report results (e.g. expected conditions met, performance results, etc.) > > WDYT? > > [1] https://github.com/nizhikov/ignite/pull/15 > [2] https://github.com/confluentinc/ducktape > [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html > [4] https://yadi.sk/d/JC8ciJZjrkdndg