Re: [DISCUSSION] Ignite integration testing framework.

Max Shonichev Wed, 15 Jul 2020 05:59:21 -0700

Anton, Nikolay,

I want to share some more findings about ducktests I've stubmled uponduring porting them to Tiden.

First problem was that GridGain Tiden-based tests by default use realproduction-like configuration for Ignite nodes, notably:


 - persitence enabled
 - ~120 caches in ~40 groups
 - data set size around 1M keys per each cache
 - primitive and PoJo cache values
 - extensive use of query entities (indices)

When I've tried to run 4 nodes with such configuration in docker, mynotebook nearly burns. Nevertheless, grid was starting and working OK,but for one little 'but': each successive version under test wasstarting slower and slower.

The 2.7.6 was the fastest, 2.8.0 and 2.8.1 a little bit slower, and yourfork (2.9.0-SNAPSHOT) failed to start 4 persistence-enabled nodes withindefault 120 seconds timeout. In order to mimick behavior of your tests Ihad to turn off persistence and use only 1 cache too.

It's a pity that you completely ignore persistence and indices in yourducktests, otherwise you would quickly stuck into same limitation.

I hope in the nearest time I would adopt Tiden docker PoC to ourTeamCity and we'll try to git-bisect in order to find where thisslowdown comes from. After that I'll file a bug to IGNITE Jira.

Another problem with your rebalance benchmark is it's low accuracy dueto granularity of measurements.

You don't actually measure rebalance time, you measure time that takesyou to find a specific string in logs, that's confusing.


The scenario of your test is as follows:

1. start 3 server nodes
2. start 1 data loading client, preload a data, stop client
3. start 1 more server node
4. wait till server joins topology

5. wait till this server node completes exchange and write'rebalanced=true, wasRebalanced=false' message to log

6. report time was taken by step 5 as 'Rebalance time'

Confusing thing here is that 'wait till' implementation - you actuallycontinuously re-scan logs sleeping each second and wait till messageappear. So, that means that rebalance time is at least of secondgranularity or even higher, though it is reported with nanosecondprecision.

But for such lightweight configuration (single in-memory cache) and suchsmall set of data (1M keys only), rebalancing is very fast, and usuallyperforms under 1 second or just slightly slower.

Before waiting for rebalance message you first wait for topology messageand that wait also takes time to execute.

So, at the time Python part of the test performs first scan of the logs,rebalancing is in most cases already done and time you report as'0.0760810375213623' is actually the time to execute logs scanning code.

However, if rebalancing perform just a little bit slower after topologyupdate, then first scan of logs is failed, you sleep for whole onesecond and rescan logs and there you got your message and report it as'1.02205491065979'.

Under different conditions, dockerized application may run a littleslower or a little faster, that depends on overall system load, freememory, etc. I've tried to increase load on my laptop by running browseror maven build, and time to scan logs may fluctuate from 0.02 to 0.09 oreven 1.02 seconds. Note, that in CI environment, high system load fromtenants is a quite ordinary situation.

Suppose we adopted rebalance improvements and all versions after 2.9.0would perform within 1 second just as 2.9.0 itself. Then your benchmarkwould either report false negative (e.g. 0.02 for master and 0.03 forPR), while actually on next re-run it would pass (e.g. 0.07 for masterand 0.03 for PR). That's not quite the 'stable and non-flacky' testIgnite community wants.


What suggestions do you have to improve benchmark measurement accuracy?

A third question is about PME free switch benchmark. Under someconditions, LongTxStreamerApplication actually hangs up PME. It need tobe investigated further, but either this was due to persistence enabledor due to missing -DIGNITE_ALLOW_ATOMIC_OPS_IN_TX=false


Can you share some details about IGNITE_ALLOW_ATOMIC_OPS_IN_TX option?

Also, have you had performed a test of PME free switch withpersistence-enabled caches?



On 09.07.2020 10:11, Max Shonichev wrote:

Anton,

well, strange thing, but clean up and rerun helped.


Ubuntu 18.04

====================================================================================================

SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:       2020-07-06--003
run time:         4 minutes 44.835 seconds
tests run:        5
passed:           5
failed:           0
ignored:          0

====================================================================================================test_id:ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1

status:     PASS
run time:   41.927 seconds
{"Rebalanced in (sec)": 1.02205491065979}

----------------------------------------------------------------------------------------------------test_id:ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev

status:     PASS
run time:   51.985 seconds
{"Rebalanced in (sec)": 0.0760810375213623}

----------------------------------------------------------------------------------------------------test_id:ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6

status:     PASS
run time:   1 minute 4.283 seconds

{"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worstlatency (ms)": "31035"}----------------------------------------------------------------------------------------------------test_id:ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev

status:     PASS
run time:   1 minute 13.089 seconds

{"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worstlatency (ms)": "139"}----------------------------------------------------------------------------------------------------test_id:ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client

status:     PASS
run time:   53.332 seconds

----------------------------------------------------------------------------------------------------



MacBook

================================================================================

SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:       2020-07-06--001
run time:         6 minutes 58.612 seconds
tests run:        5
passed:           5
failed:           0
ignored:          0

================================================================================test_id:ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1

status:     PASS
run time:   48.724 seconds
{"Rebalanced in (sec)": 3.2574470043182373}

--------------------------------------------------------------------------------test_id:ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev

status:     PASS
run time:   1 minute 23.210 seconds
{"Rebalanced in (sec)": 2.165921211242676}

--------------------------------------------------------------------------------test_id:ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6

status:     PASS
run time:   1 minute 12.659 seconds

{"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency(ms)": "31063"}--------------------------------------------------------------------------------test_id:ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev

status:     PASS
run time:   1 minute 57.257 seconds

{"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worstlatency (ms)": "1010"}--------------------------------------------------------------------------------test_id:ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client

status:     PASS
run time:   1 minute 36.317 seconds

=============

while relative numbers proportion remains the same for different Igniteversions, absolute number for mac/linux differ more then twice.

I'm finalizing code with 'local Tiden' appliance for your tests. PRwould be ready soon.


Have you had a chance to deploy ducktests in bare metal?



On 06.07.2020 14:27, Anton Vinogradov wrote:

Max,

Thanks for the check!

Is it OK for those tests to fail?

No.
I see really strange things at logs.

Looks like you have concurrent ducktests run started not expectedservices,

and this broke the tests.
Could you please clean up the docker (use clean-up script [1]).
Compile sources (use script [2]) and rerun the tests.

[1]

https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh

[2]

https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh

On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <nizhi...@apache.org>wrote:

Hello, Maxim.

Thanks for writing down the minutes.

There is no such thing as «Nikolay team» on the dev-list.
I propose to focus on product requirements and what we want to gain from
the framework instead of taking into account the needs of some team.

Can you, please, write down your version of requirements so we canreach a

consensus on that and therefore move to the discussion of the
implementation?

6 июля 2020 г., в 11:18, Max Shonichev <mshon...@yandex.ru> написал(а):

Yes, Denis,

common ground seems to be as follows:
Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC

over physical hosts and share benchmark results. In the meantime,while I

strongly believe that dockerized approach to benchmarking is a road to

misleading and false positives, I'll prepare a PoC of Tiden indockerizedenvironment to support 'fast development prototyping' usecase Nikolayteam

insist on. It should be a matter of few days.


As a side note, I've run Anton PoC locally and would like to have some

comments about results:


Test system: Ubuntu 18.04, docker 19.03.6
Test commands:


git clone -b ignite-ducktape g...@github.com:anton-vinogradov/ignite.git
cd ignite
mvn clean install -DskipTests -Dmaven.javadoc.skip=true

-Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala

cd modules/ducktests/tests/docker
./run_tests.sh

Test results:

====================================================================================================

SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:       2020-07-05--004
run time:         7 minutes 36.360 seconds
tests run:        5
passed:           3
failed:           2
ignored:          0

====================================================================================================

test_id:

ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1

status:     FAIL
run time:   3 minutes 12.232 seconds

----------------------------------------------------------------------------------------------------

test_id:

ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6

status:     FAIL
run time:   1 minute 33.076 seconds


Is it OK for those tests to fail? Attached is full test report


On 02.07.2020 17:46, Denis Magda wrote:

Folks,
Please share the summary of that Slack conversation here for records

once

you find common ground.
-
Denis
On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <nizhi...@apache.org>

wrote:

Igniters.
All who are interested in integration testing framework discussionare
welcome into slack channel -

https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2

2 июля 2020 г., в 13:06, Anton Vinogradov <a...@apache.org>написал(а):


Max,
Thanks for joining us.

1. tiden can deploy artifacts by itself, while ducktape relies on
dependencies being deployed by external scripts.

No. It is important to distinguish development, deploy, and

orchestration.

All-in-one solutions have extremely limited usability.
As to Ducktests:
Docker is responsible for deployments during development.
CI/CD is responsible for deployments during release and nightly

checks.

It's up to the team to chose AWS, VM, BareMetal, and even OS.

Ducktape is responsible for orchestration.

2. tiden can execute actions over remote nodes in real parallel

fashion,

while ducktape internally does all actions sequentially.
No. Ducktape may start any service in parallel. See Pme-freebenchmark

[1] for details.

if we used ducktape solution we would have to instead prepare some
deployment scripts to pre-initialize Sberbank hosts, for example,

with

Ansible or Chef.
Sure, because a way of deploy depends on infrastructure.
How can we be sure that OS we use and the restrictions we havewill be

compatible with Tiden?

You have solved this deficiency with docker by putting all

dependencies

into one uber-image ...

and

I guess we all know about docker hyped ability to run over

distributed

virtual networks.

It is very important not to confuse the test's development (docker

image

you're talking about) and real deployment.

If we had stopped and started 5 nodes one-by-one, as ducktape does

All actions can be performed in parallel.
See how Ducktests [2] starts cluster in parallel for example.

[1]

https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84

[2]

https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79


On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <nizhi...@apache.org>

wrote:

Hello, Maxim.

1. tiden can deploy artifacts by itself, while ducktape relies on

dependencies being deployed by external scripts


Why do you think that maintaining deploy scripts coupled with the

testing framework is an advantage?

I thought we want to see and maintain deployment scripts separatefrom

the testing framework.

2. tiden can execute actions over remote nodes in real parallel

fashion, while ducktape internally does all actions sequentially.


Can you, please, clarify, what actions do you have in mind?
And why we want to execute them concurrently?
Ignite node start, Client application execution can be done

concurrently

with the ducktape approach.

If we used ducktape solution we would have to instead prepare some

deployment scripts to pre-initialize Sberbank hosts, for example,with

Ansible or Chef


We shouldn’t take some user approach as an argument in this

discussion.

Let’s discuss a general approach for all users of the Ignite. Anyway,

what

is wrong with the external deployment script approach?


We, as a community, should provide several ways to run integration

tests

out-of-the-box AND the ability to customize deployment regarding the

user

landscape.
You have solved this deficiency with docker by putting all
dependencies into one uber-image and that looks like simple andelegant
solution however, that effectively limits you to single-host testing.
Docker image should be used only by the Ignite developers to test
something locally.
It’s not intended for some real-world testing.
The main issue with the Tiden that I see, it tested andmaintained as

closed source solution.

This can lead to the hard to solve problems when we start using and

maintaining it as an open-source solution.

Like, how many developers used Tiden? And how many of developerswere

not authors of the Tiden itself?

2 июля 2020 г., в 12:30, Max Shonichev <mshon...@yandex.ru>

написал(а):


Anton, Nikolay,

Let's agree on what we are arguing about: whether it is about "like

or

don't like" or about technical properties of suggested solutions.


If it is about likes and dislikes, then the whole discussion is

meaningless. However, I hope together we can analyse pros and cons
carefully.

As far as I can understand now, two main differences betweenducktape

and tiden is that:


1. tiden can deploy artifacts by itself, while ducktape relies on

dependencies being deployed by external scripts.


2. tiden can execute actions over remote nodes in real parallel

fashion, while ducktape internally does all actions sequentially.


As for me, these are very important properties for distributed

testing

framework.
First property let us easily reuse tiden in existinginfrastructures,
for example, during Zookeeper IEP testing at Sberbank site we usedthe

same

tiden scripts that we use in our lab, the only change was putting a

list of

hosts into config.
If we used ducktape solution we would have to instead prepare some
deployment scripts to pre-initialize Sberbank hosts, for example,with
Ansible or Chef.
You have solved this deficiency with docker by putting all
dependencies into one uber-image and that looks like simple andelegant
solution,
however, that effectively limits you to single-host testing.

I guess we all know about docker hyped ability to run over

distributed

virtual networks. We used to go that way, but quickly found thatit is

more

of the hype than real work. In real environments, there are problems

with

routing, DNS, multicast and broadcast traffic, and many others, that

turn

docker-based distributed solution into a fragile hard-to-maintain

monster.


Please, if you believe otherwise, perform a run of your PoC over at

least two physical hosts and share results with us.


If you consider that one physical docker host is enough, please,

don't

overlook that we want to run real scale scenarios, with 50-100 cache
groups, persistence enabled and a millions of keys loaded.


Practical limit for such configurations is 4-6 nodes per single

physical host. Otherwise, tests become flaky due to resource

starvation.


Please, if you believe otherwise, perform at least a 10 of runs of

your PoC with other tests running at TC (we're targeting TeamCity,

right?)

and share results so we could check if the numbers are reproducible.


I stress this once more: functional integration tests are OK to run

in

Docker and CI, but running benchmarks in Docker is a big NO GO.
Second property let us write tests that require real-parallelactions
over hosts.
For example, agreed scenario for PME benchmarkduring "PME

optimization

stream" was as follows:


  - 10 server nodes, preloaded with 1M of keys
  - 4 client nodes perform transactional load  (client nodes

physically

separated from server nodes)

  - during load:
  -- 5 server nodes stopped in parallel
  -- after 1 minute, all 5 nodes are started in parallel
  - load stopped, logs are analysed for exchange times.

If we had stopped and started 5 nodes one-by-one, as ducktape does,

then partition map exchange merge would not happen and we could not

have

measured PME optimizations for that case.

These are limitations of ducktape that we believe as a moreimportant

argument "against" than you provide "for".




On 30.06.2020 14:58, Anton Vinogradov wrote:

Folks,
First, I've created PR [1] with ducktests improvements
PR contains the following changes
- Pme-free switch proof-benchmark (2.7.6 vs master)
- Ability to check (compare with) previous releases (eg. 2.7.6 &

2.8)

- Global refactoring
-- benchmarks javacode simplification
-- services python and java classes code deduplication
-- fail-fast checks for java and python (eg. application should

explicitly write it finished with success)

-- simple results extraction from tests and benchmarks
-- javacode now configurable from tests/benchmarks
-- proper SIGTERM handling at javacode (eg. it may finish last

operation and log results)

-- docker volume now marked as delegated to increase executionspeed

for mac & win users

-- Ignite cluster now start in parallel (start speed-up)
-- Ignite can be configured at test/benchmark
- full and module assembly scripts added

Great job done! But let me remind one of Apache Ignite principles:
week of thinking save months of development.

Second, I'd like to propose to accept ducktests [2] (ducktape

integration) as a target "PoC check & real topology benchmarkingtool".

Ducktape pros
- Developed for distributed system by distributed systemdevelopers.
So does Tiden
- Developed since 2014, stable.
Tiden is also pretty stable, and development start date is not agood

argument, for example pytest is since 2004, pytest-xdist (plugin for
distributed testing) is since 2010, but we don't see it as a

alternative at

all.

- Proven usability by usage at Kafka.
Tiden is proven usable by usage at GridGain and Sberbankdeployments.
Core, storage, sql and tx teams use benchmark results provided by

Tiden on a daily basis.

- Dozens of dozens tests and benchmarks at Kafka as a greatexample

pack.

We'll donate some of our suites to Ignite as I've mentioned in

previous letter.

- Built-in Docker support for rapid development and checks.

False, there's no specific 'docker support' in ducktape itself, you

just wrap it in docker by yourself, because ducktape is lacking

deployment

abilities.
- Great for CI automation.
False, there's no specific CI-enabled features in ducktape.Tiden, on
the other hand, provide generic xUnit reporting format, which is

supported

by both TeamCity and Jenkins. Also, instead of using private keys,

Tiden

can use SSH agent, which is also great for CI, because both

TeamCity and Jenkins store keys in secret storage available onlyfor

ssh-agent and only for the time of the test.

As an additional motivation, at least 3 teams

- IEP-45 team (to check crash-recovery speed-up (discovery and

Zabbix

speed-up))
- Ignite SE Plugins team (to check plugin's features does not
slow-down or broke AI features)
- Ignite SE QA team (to append already developedsmoke/load/failover
tests to AI codebase)
Please, before recommending your tests to other teams, provideproofs
that your tests are reproducible in real environment.
now, wait for ducktest merge to start checking cases theyworking on
in AI way.
Thoughts?
Let us together review both solutions, we'll try to run yourtests in
our lab, and you'll try to at least checkout tiden and see if same

tests

can be implemented with it?

[1] https://github.com/apache/ignite/pull/7967
[2] https://github.com/apache/ignite/tree/ignite-ducktape
On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <

nizhi...@apache.org

<mailto:nizhi...@apache.org>> wrote:

    Hello, Maxim.
    Thank you for so detailed explanation.

Can we put the content of this discussion somewhere on thewiki?

    So It doesn’t get lost.
    I divide the answer in several parts. From the requirements to

the

    implementation.
    So, if we agreed on the requirements we can proceed with the
    discussion of the implementation.
    1. Requirements:
    The main goal I want to achieve is *reproducibility* of the

tests.

I’m sick and tired with the zillions of flaky, rarelyfailed, and
    almost never failed tests in Ignite codebase.
    We should start with the simplest scenarios that will be as

reliable

    as steel :)
    I want to know for sure:
       - Is this PR makes rebalance quicker or not?
       - Is this PR makes PME quicker or not?
So, your description of the complex test scenario looks asa next
    step to me.
    Anyway, It’s cool we already have one.
The second goal is to have a strict test lifecycle as wehave in
    JUnit and similar frameworks.
> It covers production-like deployment and running ascenarios

over

    a single database instance.
    Do you mean «single cluster» or «single host»?
    2. Existing tests:
     > A Combinator suite allows to run set of operations

concurrently

    over given database instance.
     > A Consumption suite allows to run a set production-like

actions

    over given set of Ignite/GridGain versions and compare test

metrics

    across versions
     > A Yardstick suite

> A Stress suite that simulates hardware environmentdegradation

     > An Ultimate, DR and Compatibility suites that performs

functional

    regression testing
     > Regression
    Great news that we already have so many choices for testing!
    Mature test base is a big +1 for Tiden.
    3. Comparison:
     > Criteria: Test configuration
     > Ducktape: single JSON string for all tests
     > Tiden: any number of YaML config files, command line option

for

fine-grained test configuration, ability to select/modifytests

    behavior based on Ignite version.
    1. Many YAML files can be hard to maintain.

2. In ducktape, you can set parameters via «—parameters»option.

    Please, take a look at the doc [1]
     > Criteria: Cluster control

> Tiden: additionally can address cluster as a whole andexecute

    remote commands in parallel.
    It seems we implement this ability in the PoC, already.
     > Criteria: Test assertions

> Tiden: simple asserts, also few customized assertionhelpers.

     > Ducktape: simple asserts.
    Can you, please, be more specific.
    What helpers do you have in mind?

Ducktape has an asserts that waits for logfile messages orsome

    process finish.
     > Criteria: Test reporting
     > Ducktape: limited to its own text/HTML format
    Ducktape have
    1. Text reporter
    2. Customizable HTML reporter
    3. JSON reporter.
    We can show JSON with the any template or tool.
     > Criteria: Provisioning and deployment

> Ducktape: can provision subset of hosts from cluster fortest

    needs. However, that means, that test can’t be scaled without

test

code changes. Does not do any deploy, relies on externalmeans,

e.g.

    pre-packaged in docker image, as in PoC.
    This is not true.
    1. We can set explicit test parameters(node number) via

parameters.

    We can increase client count of cluster size without test code
changes.
2. We have many choices for the test environment. Thesechoices

are

    tested and used in other projects:
             * docker
             * vagrant
             * private cloud(ssh access)
             * ec2
    Please, take a look at Kafka documentation [2]
> I can continue more on this, but it should be enough fornow:
    We need to go deeper! :)
    [1]
https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
    [2]
https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> 9 июня 2020 г., в 17:25, Max A. Shonichev<mshon...@yandex.ru
    <mailto:mshon...@yandex.ru>> написал(а):
     >
     > Greetings, Nikolay,
     >
> First of all, thank you for you great effort preparingPoC of
    integration testing to Ignite community.
     >
> It’s a shame Ignite did not have at least some suchtests yet, however, GridGain, as a major contributor to Apache Ignitehad a profound collection of in-house tools to performintegration and
    performance testing for years already and while we slowly

consider

sharing our expertise with the community, your initiativemakes

us

    drive that process a bit faster, thanks a lot!
     >
> I reviewed your PoC and want to share a little aboutwhat we

do

    on our part, why and how, hope it would help community take

proper

    course.
     >
> First I’ll do a brief overview of what decisions we madeand
what
we do have in our private code base, next I’ll describewhat we
have
already donated to the public and what we plan public next,then I’ll compare both approaches highlighting deficiencies inorder

to

    spur public discussion on the matter.
     >
     > It might seem strange to use Python to run Bash to run Java
applications because that introduces IT industry best ofbreed’ – the Python dependency hell – to the Java application codebase.

The

    only strangest decision one can made is to use Maven to run

Docker

to run Bash to run Python to run Bash to run Java, butdesperate
    times call for desperate measures I guess.
     >
> There are Java-based solutions for integration testingexists, e.g. Testcontainers [1], Arquillian [2], etc, and theymight go
well
    for Ignite community CI pipelines by them selves. But we also
wanted
    to run performance tests and benchmarks, like the dreaded PME
benchmark, and this is solved by totally different set oftools

in

    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
     >
     > Speaking specifically about benchmarking, Apache Ignite
community
already has Yardstick [6], and there’s nothing wrong withwriting PME benchmark using Yardstick, but we also wanted to beable to

run

    scenarios like this:
     > - put an X load to a Ignite database;
     > - perform an Y set of operations to check how Ignite copes

with

    operations under load.
     >
     > And yes, we also wanted applications under test be deployed

‘like

    in a production’, e.g. distributed over a set of hosts. This

arises

questions about provisioning and nodes affinity which I’llcover

in

    detail later.
     >
> So we decided to put a little effort to build a simpletool to cover different integration and performance scenarios, andour QA lab first attempt was PoC-Tester [7], currently open sourcefor

all

    but for reporting web UI. It’s a quite simple to use 95%

Java-based

    tool targeted to be run on a pre-release QA stage.
     >
> It covers production-like deployment and running ascenarios

over

    a single database instance. PoC-Tester scenarios consists of a

sequence of tasks running sequentially or in parallel.After all

    tasks complete, or at any time during test, user can run logs
    collection task, logs are checked against exceptions and a

summary

of found issues and task ops/latency statistics isgenerated at

the

    end of scenario. One of the main PoC-Tester features is its
    fire-and-forget approach to task managing. That is, you can

deploy

    grid and left it running for weeks, periodically firing some

tasks

    onto it.
     >
     > During earliest stages of PoC-Tester development it becomes
quite
clear that Java application development is a tediousprocess and architecture decisions you take during development are slowand
hard
    to change.
     > For example, scenarios like this
     > - deploy two instances of GridGain with master-slave data
    replication configured;
     > - put a load on master;
     > - perform checks on slave,
     > or like this:
> - preload a 1Tb of data by using your favorite tool ofchoice

to

    an Apache Ignite of version X;
> - run a set of functional tests running Apache Igniteversion

    over preloaded data,
     > do not fit well in the PoC-Tester workflow.
     >
     > So, this is why we decided to use Python as a generic

scripting

    language of choice.
     >
     > Pros:
     > - quicker prototyping and development cycles
     > - easier to find DevOps/QA engineer with Python skills than

one

    with Java skills
> - used extensively all over the world for DevOps/CIpipelines
and
thus has rich set of libraries for all possible integrationuses
cases.
     >
     > Cons:
     > - Nightmare with dependencies. Better stick to specific
    language/libraries version.
     >
> Comparing alternatives for Python-based testingframework we
have
    considered following requirements, somewhat similar to what

you’ve

    mentioned for Confluent [8] previously:
> - should be able run locally or distributed (bare metalor in
the
    cloud)
> - should have built-in deployment facilities forapplications
    under test
     > - should separate test configuration and test code
> -- be able to easily reconfigure tests by simpleconfiguration
    changes
     > -- be able to easily scale test environment by simple
    configuration changes
> -- be able to perform regression testing by simpleswitching
    artifacts under test via configuration
> -- be able to run tests with different JDK version bysimple
    configuration changes
     > - should have human readable reports and/or reporting tools
    integration
> - should allow simple test progress monitoring, one doesnot
want
to run 6-hours test to find out that application actuallycrashed
    during first hour.
     > - should allow parallel execution of test actions
     > - should have clean API for test writers
     > -- clean API for distributed remote commands execution
> -- clean API for deployed applications start / stop andother
    operations
     > -- clean API for performing check on results
> - should be open source or at least source code shouldallow
ease
    change or extension
     >
> Back at that time we found no better alternative than towrite our own framework, and here goes Tiden [9] as GridGainframework

of

    choice for functional integration and performance testing.
     >
     > Pros:
     > - solves all the requirements above
     > Cons (for Ignite):
     > - (currently) closed GridGain source
     >
     > On top of Tiden we’ve built a set of test suites, some of

which

    you might have heard already.
     >
     > A Combinator suite allows to run set of operations

concurrently

    over given database instance. Proven to find at least 30+ race
    conditions and NPE issues.
     >
     > A Consumption suite allows to run a set production-like

actions

    over given set of Ignite/GridGain versions and compare test

metrics

across versions, like heap/disk/CPU consumption, time toperform
    actions, like client PME, server PME, rebalancing time, data
    replication time, etc.
     >
> A Yardstick suite is a thin layer of Python glue code torun Apache Ignite pre-release benchmarks set. Yardstick itselfhas a
    mediocre deployment capabilities, Tiden solves this easily.
     >
> A Stress suite that simulates hardware environmentdegradation
    during testing.
     >
     > An Ultimate, DR and Compatibility suites that performs

functional

    regression testing of GridGain Ultimate Edition features like
    snapshots, security, data replication, rolling upgrades, etc.
     >
     > A Regression and some IEPs testing suites, like IEP-14,

IEP-15,

    etc, etc, etc.
     >
> Most of the suites above use another in-house developedJava

tool

    – PiClient – to perform actual loading and miscellaneous

operations

with Ignite under test. We use py4j Python-Java gatewaylibrary

to

    control PiClient instances from the tests.
     >

> When we considered CI, we put TeamCity out of scope,because

    distributed integration and performance tests tend to run for

hours

and TeamCity agents are scarce and costly resource. So,bundled
with
    Tiden there is jenkins-job-builder [10] based CI pipelines and
Jenkins xUnit reporting. Also, rich web UI tool Wardaggregates
test
    run reports across versions and has built in visualization

support

    for Combinator suite.
     >
     > All of the above is currently closed source, but we plan to

make

    it public for community, and publishing Tiden core [9] is the

first

step on that way. You can review some examples of usingTiden for
    tests at my repository [11], for start.
     >
     > Now, let’s compare Ducktape PoC and Tiden.
     >
     > Criteria: Language
     > Tiden: Python, 3.7
     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
compatible, but actually can’t work with Python 3.7 due tobroken
    Zmq dependency.
> Comment: Python 3.7 has a much better support forasync-style code which might be crucial for distributed applicationtesting.
     > Score: Tiden: 1, Ducktape: 0
     >
     > Criteria: Test writers API
     > Supported integration test framework concepts are basically

the

same:

     > - a test controller (test runner)
     > - a cluster
     > - a node
     > - an application (a service in Ducktape terms)
     > - a test
     > Score: Tiden: 5, Ducktape: 5
     >
     > Criteria: Tests selection and run

> Ducktape: suite-package-class-method level selection,internal

    scheduler allows to run tests in suite in parallel.
     > Tiden: also suite-package-class-method level selection,
    additionally allows selecting subset of tests by attribute,

parallel

    runs not built in, but allows merging test reports after

different

runs.

     > Score: Tiden: 2, Ducktape: 2
     >
     > Criteria: Test configuration
     > Ducktape: single JSON string for all tests
     > Tiden: any number of YaML config files, command line option

for

fine-grained test configuration, ability to select/modifytests
    behavior based on Ignite version.
     > Score: Tiden: 3, Ducktape: 1
     >
     > Criteria: Cluster control
     > Ducktape: allow execute remote commands by node granularity
> Tiden: additionally can address cluster as a whole andexecute
    remote commands in parallel.
     > Score: Tiden: 2, Ducktape: 1
     >
     > Criteria: Logs control
> Both frameworks have similar builtin support for remotelogs collection and grepping. Tiden has built-in plugin that canzip,
    collect arbitrary log files from arbitrary locations at
    test/module/suite granularity and unzip if needed, also

application

API to search / wait for messages in logs. Ducktape allowseach
    service declare its log files location (seemingly does not

support

logs rollback), and a single entrypoint to collect servicelogs.
     > Score: Tiden: 1, Ducktape: 1
     >
     > Criteria: Test assertions
> Tiden: simple asserts, also few customized assertionhelpers.
     > Ducktape: simple asserts.
     > Score: Tiden: 2, Ducktape: 1
     >
     > Criteria: Test reporting
     > Ducktape: limited to its own text/html format
> Tiden: provides text report, yaml report for reportingtools
    integration, XML xUnit report for integration with

Jenkins/TeamCity.

     > Score: Tiden: 3, Ducktape: 1
     >
     > Criteria: Provisioning and deployment

> Ducktape: can provision subset of hosts from cluster fortest

    needs. However, that means, that test can’t be scaled without

test

code changes. Does not do any deploy, relies on externalmeans,
e.g.
    pre-packaged in docker image, as in PoC.
     > Tiden: Given a set of hosts, Tiden uses all of them for the
test.
Provisioning should be done by external means. However,provides

    conventional automated deployment routines.
     > Score: Tiden: 1, Ducktape: 1
     >
     > Criteria: Documentation and Extensibility

> Tiden: current API documentation is limited, shouldchange as

we

go open source. Tiden is easily extensible via hooks andplugins,

    see example Maven plugin and Gatling application at [11].
     > Ducktape: basic documentation at readthedocs.io
    <http://readthedocs.io>. Codebase is rigid, framework core is

tightly coupled and hard to change. The only possibleextension

    mechanism is fork-and-rewrite.
     > Score: Tiden: 2, Ducktape: 1
     >

> I can continue more on this, but it should be enough fornow:

     > Overall score: Tiden: 22, Ducktape: 14.
     >
     > Time for discussion!
     >
     > ---
     > [1] - https://www.testcontainers.org/
     > [2] - http://arquillian.org/guides/getting_started/
     > [3] - https://jmeter.apache.org/index.html
     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
     > [5] - https://gatling.io/docs/current/
     > [6] - https://github.com/gridgain/yardstick
     > [7] - https://github.com/gridgain/poc-tester
     > [8] -

https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements

     > [9] - https://github.com/gridgain/tiden
     > [10] - https://pypi.org/project/jenkins-job-builder/
     > [11] - https://github.com/mshonichev/tiden_examples
     >
     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
     >> Hello,
     >>
     >> Branch with duck tape created -
    https://github.com/apache/ignite/tree/ignite-ducktape
     >>
     >> Any who are willing to contribute to PoC are welcome.
     >>
     >>
     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
    <nizhikov....@gmail.com <mailto:nizhikov....@gmail.com>>

написал(а):

     >>>
     >>> Hello, Denis.
     >>>
     >>> There is no rush with these improvements.

>>> We can wait for Maxim proposal and compare twosolutions :)

     >>>
     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dma...@apache.org
    <mailto:dma...@apache.org>> написал(а):
     >>>>
     >>>> Hi Nikolay,
     >>>>

>>>> Thanks for kicking off this conversation and sharingyour

    findings with the
     >>>> results. That's the right initiative. I do agree that

Ignite

    needs to have
>>>> an integration testing framework with capabilitieslisted

by

you.

     >>>>

>>>> As we discussed privately, I would only check ifinstead of

     >>>> Confluent's Ducktape library, we can use an integration
    testing framework
     >>>> developed by GridGain for testing of Ignite/GridGain

clusters.

    That
     >>>> framework has been battle-tested and might be more

convenient for

>>>> Ignite-specific workloads. Let's wait for @MaksimShonichev >>>> <mshonic...@gridgain.com<mailto:mshonic...@gridgain.com>>

who

    promised to join this thread once he finishes
     >>>> preparing the usage examples of the framework. To my
    knowledge, Max has
     >>>> already been working on that for several days.
     >>>>
     >>>> -
     >>>> Denis
     >>>>
     >>>>
     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
    <nizhi...@apache.org <mailto:nizhi...@apache.org>>
     >>>> wrote:
     >>>>
     >>>>> Hello, Igniters.
     >>>>>

>>>>> I created a PoC [1] for the integration tests ofIgnite.

     >>>>>
     >>>>> Let me briefly explain the gap I want to cover:
     >>>>>

>>>>> 1. For now, we don’t have a solution for automatedtesting

of

    Ignite on
     >>>>> «real cluster».
     >>>>> By «real cluster» I mean cluster «like a production»:
     >>>>>       * client and server nodes deployed on different

hosts.

>>>>> * thin clients perform queries from some otherhosts
     >>>>>       * etc.
     >>>>>
     >>>>> 2. We don’t have a solution for automated benchmarks of

some

    internal
     >>>>> Ignite process
     >>>>>       * PME
     >>>>>       * rebalance.
>>>>> This means we don’t know - Do we performrebalance(or PME)

in

    2.7.0 faster
     >>>>> or slower than in 2.8.0 for the same cluster?
     >>>>>
     >>>>> 3. We don’t have a solution for automated testing of

Ignite

    integration in
     >>>>> a real-world environment:
     >>>>> Ignite-Spark integration can be taken as an example.
     >>>>> I think some ML solutions also should be tested in

real-world

    deployments.
     >>>>>
     >>>>> Solution:
     >>>>>
>>>>> I propose to use duck tape library from confluent(apache

2.0

    license)
>>>>> I tested it both on the real cluster(Yandex Cloud)and on

the

    local
     >>>>> environment(docker) and it works just fine.
     >>>>>
     >>>>> PoC contains following services:
     >>>>>
     >>>>>       * Simple rebalance test:
     >>>>>               Start 2 server nodes,
     >>>>>               Create some data with Ignite client,
     >>>>>               Start one more server node,
     >>>>>               Wait for rebalance finish
     >>>>>       * Simple Ignite-Spark integration test:

>>>>> Start 1 Spark master, start 1 Sparkworker,

     >>>>>               Start 1 Ignite server node
     >>>>>               Create some data with Ignite client,
     >>>>>               Check data in application that queries it

from

    Spark.
     >>>>>
     >>>>> All tests are fully automated.
     >>>>> Logs collection works just fine.
     >>>>> You can see an example of the tests report - [4].
     >>>>>
     >>>>> Pros:
     >>>>>

>>>>> * Ability to test local changes(no need to publicchanges

to

    some remote
     >>>>> repository or similar).
     >>>>> * Ability to parametrize test environment(run the same

tests

    on different
     >>>>> JDK, JVM params, config, etc.)

>>>>> * Isolation by default so system tests are asreliable as

    possible.
     >>>>> * Utilities for pulling up and tearing down services

easily

    in clusters in
     >>>>> different environments (e.g. local, custom cluster,

Vagrant,

    K8s, Mesos,
     >>>>> Docker, cloud providers, etc.)
     >>>>> * Easy to write unit tests for distributed systems

>>>>> * Adopted and successfully used by other distributedopen

    source project -
     >>>>> Apache Kafka.
     >>>>> * Collect results (e.g. logs, console output)
     >>>>> * Report results (e.g. expected conditions met,

performance

    results, etc.)
     >>>>>
     >>>>> WDYT?
     >>>>>
     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
     >>>>> [2] https://github.com/confluentinc/ducktape
     >>>>> [3]

https://ducktape-docs.readthedocs.io/en/latest/run_tests.html

     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg

<2020-07-05--004.tar.gz>

Re: [DISCUSSION] Ignite integration testing framework.

Reply via email to