GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/5990
[FLINK-9322][FLINK-9320] [e2e] Improvements to e2e standalone chaos monkey test ## What is the purpose of the change This PR is based on #5941. Only the last 2 commits are relevant. This PR improves our standalone e2e chaos monkey test by: - Using the general purpose DataStream job, instead of the state machine example, to have a wider coverage of commonly used DataStream program building blocks. - Lets the running job simulate failures by throwing exceptions. This enhances the intensiveness of the chaos monkey test. ## Brief change log - b01cfda Allows the general purpose job to configure whether or not to simulate failures. This resolves FLINK-9322. - 4009406 in `test_ha.sh`, use the general purpose job instead. This change additionally lets the e2e test now have failures caused by the user application, and not just TM / JM shutdowns. It also changes the parameterization of the test script to be consistent with our other e2e test scripts. ## Verifying this change This is purely a change to improve current e2e tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink chaos-monkey-e2e Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5990 ---- commit 8db7f894b67b00f94148e0314a1c10d76266a350 Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2018-04-30T10:04:43Z [hotfix] [e2e-tests] Make SequenceGeneratorSource usable for 0-size key ranges commit c8e14673e58aed0f9625e38875ec85a776282ad4 Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2018-04-30T10:05:46Z [FLINK-8971] [e2e-tests] Include broadcast / union state in general purpose DataStream job commit 78354b295832fa2ec5d829ec4ac21150ecac1231 Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2018-05-08T03:44:13Z PR review - refactor source run function commit f346fd0958e7c3361886680912630fe22761a63d Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2018-05-08T04:39:40Z PR review - simplify broadcast / union state verification commit b01cfda7d77723e8ded2ce99ee12f17352a3ca1f Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2018-05-11T03:51:12Z [FLINK-9322] [e2e] Add failure simulation to the general purpose DataStream job commit 4009406d4729486d57cc4a71bcb72d269583a762 Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2018-05-11T07:09:00Z [FLINK-9320] [e2e] Update test_ha e2e to use general purpose DataStream job ---- ---