I think you are right and I like the idea of failing the build fast. However, when trying this approach on my local machine it didn't help: the build didn't crash (probably, because of overcommit). Did you try this approach in your VM?
Regards, Roman On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <juha.myntti...@gmail.com> wrote: > Hey, > > > Currently, tests do not run in parallel > > I don't think this is true, at least 100%. In 'top' it's clearly visible > that there are multiple JVMs. If not running tests in parallel, what are > these doing? In the main pom.xml there's configuration for the plug-in > 'maven-surefire-plugin'. > > I'm not a Maven expert, but it looks to me like this: in > https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html > it says "The other possibility for parallel test execution is setting the > parameter forkCount to a value higher than 1". I think that's happening > in Flink: > > <forkCount>${flink.forkCount}</forkCount> > > And > > <flink.forkCount>1C</flink.forkCount> > > This means there's gonna be 1 * count_of_cpus forks. > > And this one: > > <argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} > -XX:+UseG1GC</argLine> > > In my case, I have 5 CPUs, so 5 forks. I think what now happens is that > since each fork gets max 2048m heap, there's kind of mem requirement of CPU > count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * > 2048mb. > > This could be better..... I think it's a completely valid computer that > has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores > and put 16 GB of RAM there. At least memory & CPU requirements should be > documented? > > If the tests really need 2GB of heap, then maybe the forkCount should be > based on the available RAM rather than available cores, e.g. floor(RAM / > 2GB)? I don't if that's doable in Maven.... > > I think an easy and non-intrusive improvement would be to change ' > -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate > right away 2048mb (when it starts). If there's not enough memory, the tests > would fail immediately (JVM couldn't start). The tests would probably fail > anyways (my case) - better fail fast.. > > Regards, > Juha > > > > > > > > > El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (< > khachatryan.ro...@gmail.com>) escribió: > >> Thanks for sharing this, >> I think the activity of OOM-Killer means high memory pressure (it just >> kills a process with the highest score of memory consumption). >> High CPU usage can only be a consequence of it, being constant GC. >> >> Currently, tests do not run in parallel, but high memory usage can be >> caused by the nature test (e.g. running Flink with high parallelism). >> So I think the best way to deal with this is to use VM with more memory. >> >> Regards, >> Roman >> >> >> On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <juha.myntti...@gmail.com> >> wrote: >> >>> Hey, >>> >>> Good hint that /var/log/kern.log. This time I can see this: >>> >>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] >>> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service >>> ,task=java,pid=270024,uid=1000 >>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed >>> process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, >>> shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0 >>> Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process >>> 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB >>> >>> The next question is why does this happen.... I'll try to dig deeper. >>> >>> About the CPU load. I have five CPUs. Theoretically it makes sense to >>> run five tests at time to max out the CPUs. However, when I look at what >>> the five Java processes (that MVN forks) are doing, it can be seen that >>> each of those processes have a large number of threads wanting to use CPU. >>> Here's an example from 'top -H' >>> >>> top - 09:42:03 up 29 min, 1 user, load average: 17,00, 12,86, 8,81 >>> Threads: 1099 total, 21 running, 1078 sleeping, 0 stopped, 0 zombie >>> %Cpu(s): 90,5 us, 9,4 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,1 si, >>> 0,0 st >>> MiB Mem : 7961,6 total, 1614,3 free, 4023,8 used, 2323,5 >>> buff/cache >>> MiB Swap: 2048,0 total, 2047,0 free, 1,0 used. 3638,9 avail >>> Mem >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>> COMMAND >>> >>> 254825 juha 20 0 4250424 195768 27596 R 20,9 2,4 0:01.41 >>> C2 CompilerThre >>> >>> 255116 juha 20 0 2820448 99240 27488 R 20,3 1,2 0:00.78 >>> java >>> >>> 254968 juha 20 0 5312696 125212 27716 R 19,9 1,5 0:01.16 >>> java >>> >>> 255027 juha 20 0 5310648 108716 27496 R 19,9 1,3 0:00.90 >>> java >>> >>> 255123 juha 20 0 2820448 99120 27420 R 19,3 1,2 0:00.78 >>> java >>> >>> 254829 juha 20 0 4240356 184376 27792 R 17,9 2,3 0:01.26 >>> C2 CompilerThre >>> >>> 253993 juha 20 0 6436132 276808 28000 R 17,6 3,4 0:02.47 >>> C2 CompilerThre >>> >>> 254793 juha 20 0 4250424 195768 27596 R 17,3 2,4 0:01.76 >>> java >>> >>> 254801 juha 20 0 4240356 184376 27792 R 16,3 2,3 0:01.67 >>> java >>> >>> 254298 juha 20 0 6510340 435360 28212 R 15,6 5,3 0:02.82 >>> C2 CompilerThre >>> >>> 255145 juha 20 0 2820448 99240 27488 S 15,6 1,2 0:00.51 >>> C2 CompilerThre >>> >>> 255045 juha 20 0 5310648 108716 27496 R 15,3 1,3 0:00.62 >>> C2 CompilerThre >>> >>> 255151 juha 20 0 2820448 99120 27420 S 14,0 1,2 0:00.47 >>> C2 CompilerThre >>> >>> 254986 juha 20 0 5312696 125212 27716 R 12,6 1,5 0:00.76 >>> C2 CompilerThre >>> >>> 253980 juha 20 0 6436132 276808 28000 S 11,6 3,4 0:02.63 >>> java >>> >>> 255148 juha 20 0 2820448 99240 27488 S 10,6 1,2 0:00.39 >>> C1 CompilerThre >>> >>> 255154 juha 20 0 2820448 99120 27420 S 9,6 1,2 0:00.37 >>> C1 CompilerThre >>> >>> 254457 juha 20 0 4269900 218036 28236 R 9,3 2,7 0:02.22 >>> C2 CompilerThre >>> >>> 254299 juha 20 0 6510340 435360 28212 S 8,6 5,3 0:01.30 >>> C1 CompilerThre >>> >>> 255047 juha 20 0 5310648 108716 27496 S 8,6 1,3 0:00.42 >>> C1 CompilerThre >>> >>> 253994 juha 20 0 6436132 276808 28000 R 7,3 3,4 0:01.10 >>> C1 CompilerThre >>> >>> 255312 juha 20 0 4250424 195768 27596 R 7,0 2,4 0:00.21 >>> C2 CompilerThre >>> >>> 254831 juha 20 0 4240356 184376 27792 S 6,3 2,3 0:00.62 >>> C1 CompilerThre >>> >>> 254988 juha 20 0 5312696 125212 27716 S 6,3 1,5 0:00.45 >>> C1 CompilerThre >>> >>> 254828 juha 20 0 4250424 195768 27596 S 6,0 2,4 0:00.64 >>> C1 CompilerThre >>> >>> 254720 juha 20 0 6510340 435360 28212 S 5,0 5,3 0:00.15 >>> flink-akka.acto >>> >>> >>> It can be seen that the JIT related threads consume quite a lot of CPU, >>> essentially leaving less CPU available to the actual test code. By using >>> htop I can also see the garbage collection related threads eating CPU. This >>> doesn't seem right. I think it'd make sense to run the tests with less >>> parallelism to better utilize the CPUs. Having greatly more threads wanting >>> CPU slows things down (not speed up). >>> >>> However, AFAIK high CPU load shouldn't trigger OOM-killer? >>> >>> Regards, >>> Juha >>> >>> >>> >>> >>> El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (< >>> khachatryan.ro...@gmail.com>) escribió: >>> >>>> Hey, >>>> >>>> One reason could be that a resource-intensive test was killed by oom >>>> killer. You can inspect /var/log/kern.log for the related messages in your >>>> VM. >>>> >>>> Regards, >>>> Roman >>>> >>>> >>>> On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen < >>>> juha.myntti...@gmail.com> wrote: >>>> >>>>> >>>>> Hey, >>>>> >>>>> I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in >>>>> a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the >>>>> master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152. >>>>> >>>>> The command I'm using: >>>>> >>>>> apache-maven-3.2.5/bin/mvn clean verify >>>>> >>>>> The output: >>>>> >>>>> [INFO] Flink : Tests ...................................... FAILURE >>>>> [14:38 min] >>>>> [INFO] Flink : Streaming Scala ............................ SKIPPED >>>>> [INFO] Flink : Connectors : HCatalog ...................... SKIPPED >>>>> [INFO] Flink : Connectors : Base .......................... SKIPPED >>>>> [INFO] Flink : Connectors : Files ......................... SKIPPED >>>>> [INFO] Flink : Table : .................................... SKIPPED >>>>> [INFO] Flink : Table : Common ............................. SKIPPED >>>>> [INFO] Flink : Table : API Java ........................... SKIPPED >>>>> [INFO] Flink : Table : API Java bridge .................... SKIPPED >>>>> [INFO] Flink : Table : API Scala .......................... SKIPPED >>>>> [INFO] Flink : Table : API Scala bridge ................... SKIPPED >>>>> [INFO] Flink : Table : SQL Parser ......................... SKIPPED >>>>> [INFO] Flink : Libraries : ................................ SKIPPED >>>>> [INFO] Flink : Libraries : CEP ............................ SKIPPED >>>>> [INFO] Flink : Table : Planner ............................ SKIPPED >>>>> [INFO] Flink : Table : SQL Parser Hive .................... SKIPPED >>>>> [INFO] Flink : Table : Runtime Blink ...................... SKIPPED >>>>> [INFO] Flink : Table : Planner Blink ...................... SKIPPED >>>>> [INFO] Flink : Metrics : JMX .............................. SKIPPED >>>>> [INFO] Flink : Formats : .................................. SKIPPED >>>>> [INFO] Flink : Formats : Json ............................. SKIPPED >>>>> [INFO] Flink : Connectors : Kafka base .................... SKIPPED >>>>> [INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED >>>>> [INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED >>>>> [INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED >>>>> [INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED >>>>> [INFO] Flink : Connectors : HBase base .................... SKIPPED >>>>> [INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED >>>>> [INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED >>>>> [INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED >>>>> [INFO] Flink : Formats : Orc .............................. SKIPPED >>>>> [INFO] Flink : Formats : Orc nohive ....................... SKIPPED >>>>> [INFO] Flink : Formats : Avro ............................. SKIPPED >>>>> [INFO] Flink : Formats : Parquet .......................... SKIPPED >>>>> [INFO] Flink : Formats : Csv .............................. SKIPPED >>>>> [INFO] Flink : Connectors : Hive .......................... SKIPPED >>>>> [INFO] Flink : Connectors : JDBC .......................... SKIPPED >>>>> [INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED >>>>> [INFO] Flink : Connectors : Twitter ....................... SKIPPED >>>>> [INFO] Flink : Connectors : Nifi .......................... SKIPPED >>>>> [INFO] Flink : Connectors : Cassandra ..................... SKIPPED >>>>> [INFO] Flink : Connectors : Filesystem .................... SKIPPED >>>>> [INFO] Flink : Connectors : Kafka ......................... SKIPPED >>>>> [INFO] Flink : Connectors : Google PubSub ................. SKIPPED >>>>> [INFO] Flink : Connectors : Kinesis ....................... SKIPPED >>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED >>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED >>>>> [INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED >>>>> [INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED >>>>> [INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED >>>>> [INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED >>>>> [INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED >>>>> [INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED >>>>> [INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED >>>>> [INFO] Flink : Formats : Avro confluent registry .......... SKIPPED >>>>> [INFO] Flink : Formats : Sequence file .................... SKIPPED >>>>> [INFO] Flink : Formats : Compress ......................... SKIPPED >>>>> [INFO] Flink : Formats : SQL Orc .......................... SKIPPED >>>>> [INFO] Flink : Formats : SQL Parquet ...................... SKIPPED >>>>> [INFO] Flink : Formats : SQL Avro ......................... SKIPPED >>>>> [INFO] Flink : Examples : Streaming ....................... SKIPPED >>>>> [INFO] Flink : Examples : Table ........................... SKIPPED >>>>> [INFO] Flink : Examples : Build Helper : .................. SKIPPED >>>>> [INFO] Flink : Examples : Build Helper : Streaming Twitter SKIPPED >>>>> [INFO] Flink : Examples : Build Helper : Streaming State machine >>>>> SKIPPED >>>>> [INFO] Flink : Examples : Build Helper : Streaming Google PubSub >>>>> SKIPPED >>>>> [INFO] Flink : Container .................................. SKIPPED >>>>> [INFO] Flink : Queryable state : Runtime .................. SKIPPED >>>>> [INFO] Flink : Mesos ...................................... SKIPPED >>>>> [INFO] Flink : Kubernetes ................................. SKIPPED >>>>> [INFO] Flink : Yarn ....................................... SKIPPED >>>>> [INFO] Flink : Libraries : Gelly .......................... SKIPPED >>>>> [INFO] Flink : Libraries : Gelly scala .................... SKIPPED >>>>> [INFO] Flink : Libraries : Gelly Examples ................. SKIPPED >>>>> [INFO] Flink : External resources : ....................... SKIPPED >>>>> [INFO] Flink : External resources : GPU ................... SKIPPED >>>>> [INFO] Flink : Metrics : Dropwizard ....................... SKIPPED >>>>> [INFO] Flink : Metrics : Graphite ......................... SKIPPED >>>>> [INFO] Flink : Metrics : InfluxDB ......................... SKIPPED >>>>> [INFO] Flink : Metrics : Prometheus ....................... SKIPPED >>>>> [INFO] Flink : Metrics : StatsD ........................... SKIPPED >>>>> [INFO] Flink : Metrics : Datadog .......................... SKIPPED >>>>> [INFO] Flink : Metrics : Slf4j ............................ SKIPPED >>>>> [INFO] Flink : Libraries : CEP Scala ...................... SKIPPED >>>>> [INFO] Flink : Table : Uber ............................... SKIPPED >>>>> [INFO] Flink : Table : Uber Blink ......................... SKIPPED >>>>> [INFO] Flink : Python ..................................... SKIPPED >>>>> [INFO] Flink : Table : SQL Client ......................... SKIPPED >>>>> [INFO] Flink : Libraries : State processor API ............ SKIPPED >>>>> [INFO] Flink : ML : ....................................... SKIPPED >>>>> [INFO] Flink : ML : API ................................... SKIPPED >>>>> [INFO] Flink : ML : Lib ................................... SKIPPED >>>>> [INFO] Flink : ML : Uber .................................. SKIPPED >>>>> [INFO] Flink : Scala shell ................................ SKIPPED >>>>> [INFO] Flink : Dist ....................................... SKIPPED >>>>> [INFO] Flink : Yarn Tests ................................. SKIPPED >>>>> [INFO] Flink : E2E Tests : ................................ SKIPPED >>>>> [INFO] Flink : E2E Tests : CLI ............................ SKIPPED >>>>> [INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED >>>>> [INFO] Flink : E2E Tests : Parent Child classloading lib-package >>>>> SKIPPED >>>>> [INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED >>>>> [INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED >>>>> [INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED >>>>> [INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED >>>>> [INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED >>>>> [INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED >>>>> [INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED >>>>> [INFO] Flink : E2E Tests : Queryable state ................ SKIPPED >>>>> [INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED >>>>> [INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED >>>>> [INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED >>>>> [INFO] Flink : Quickstart : ............................... SKIPPED >>>>> [INFO] Flink : Quickstart : Java .......................... SKIPPED >>>>> [INFO] Flink : Quickstart : Scala ......................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED >>>>> [INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED >>>>> [INFO] Flink : E2E Tests : SQL client ..................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED >>>>> [INFO] Flink : E2E Tests : State evolution ................ SKIPPED >>>>> [INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED >>>>> [INFO] Flink : E2E Tests : Common ......................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED >>>>> [INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED >>>>> [INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED >>>>> [INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED >>>>> [INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED >>>>> [INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED >>>>> [INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED >>>>> [INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED >>>>> [INFO] Flink : E2E Tests : TPCH ........................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED >>>>> [INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED >>>>> [INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED >>>>> [INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED >>>>> [INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED >>>>> [INFO] Flink : E2E Tests : Python ......................... SKIPPED >>>>> [INFO] Flink : E2E Tests : HBase .......................... SKIPPED >>>>> [INFO] Flink : State backends : Heap spillable ............ SKIPPED >>>>> [INFO] Flink : Contrib : .................................. SKIPPED >>>>> [INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED >>>>> [INFO] Flink : FileSystems : Tests ........................ SKIPPED >>>>> [INFO] Flink : Docs ....................................... SKIPPED >>>>> [INFO] Flink : Walkthrough : .............................. SKIPPED >>>>> [INFO] Flink : Walkthrough : Common ....................... SKIPPED >>>>> [INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED >>>>> [INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED >>>>> [INFO] >>>>> ------------------------------------------------------------------------ >>>>> [INFO] BUILD FAILURE >>>>> [INFO] >>>>> ------------------------------------------------------------------------ >>>>> [INFO] Total time: 36:49 min >>>>> [INFO] Finished at: 2020-10-19T18:24:46+03:00 >>>>> [INFO] Final Memory: 179M/614M >>>>> [INFO] >>>>> ------------------------------------------------------------------------ >>>>> [ERROR] Failed to execute goal >>>>> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test >>>>> (integration-tests) on project flink-tests: There are test failures. >>>>> [ERROR] >>>>> [ERROR] Please refer to >>>>> /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the >>>>> individual test results. >>>>> [ERROR] Please refer to dump files (if any exist) [date].dump, >>>>> [date]-jvmRun[N].dump and [date].dumpstream. >>>>> [ERROR] ExecutionException The forked VM terminated without properly >>>>> saying goodbye. VM crash or System.exit called? >>>>> [ERROR] Command was /bin/sh -c cd >>>>> /home/juha/git/apache-flink/flink-tests/target && >>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>>>> /home/juha/git/apache-flink/flink-tests/target/surefire >>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>>>> surefire_122313349068739873924160tmp >>>>> [ERROR] Error occurred in starting fork, check output in log >>>>> [ERROR] Process Exit Code: 137 >>>>> [ERROR] Crashed tests: >>>>> [ERROR] >>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>>>> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: >>>>> ExecutionException The forked VM terminated without properly saying >>>>> goodbye. VM crash or System.exit called? >>>>> [ERROR] Command was /bin/sh -c cd >>>>> /home/juha/git/apache-flink/flink-tests/target && >>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>>>> /home/juha/git/apache-flink/flink-tests/target/surefire >>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>>>> surefire_122313349068739873924160tmp >>>>> [ERROR] Error occurred in starting fork, check output in log >>>>> [ERROR] Process Exit Code: 137 >>>>> [ERROR] Crashed tests: >>>>> [ERROR] >>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) >>>>> [ERROR] at >>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) >>>>> [ERROR] at >>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) >>>>> [ERROR] at >>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) >>>>> [ERROR] at >>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) >>>>> [ERROR] at >>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) >>>>> [ERROR] at >>>>> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) >>>>> [ERROR] at >>>>> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) >>>>> [ERROR] at >>>>> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) >>>>> [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) >>>>> [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) >>>>> [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) >>>>> [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) >>>>> [ERROR] at >>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>> Method) >>>>> [ERROR] at >>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>> [ERROR] at >>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> [ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566) >>>>> [ERROR] at >>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) >>>>> [ERROR] at >>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) >>>>> [ERROR] at >>>>> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) >>>>> [ERROR] at >>>>> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) >>>>> [ERROR] Caused by: >>>>> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked >>>>> VM >>>>> terminated without properly saying goodbye. VM crash or System.exit >>>>> called? >>>>> [ERROR] Command was /bin/sh -c cd >>>>> /home/juha/git/apache-flink/flink-tests/target && >>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>>>> /home/juha/git/apache-flink/flink-tests/target/surefire >>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>>>> surefire_122313349068739873924160tmp >>>>> [ERROR] Error occurred in starting fork, check output in log >>>>> [ERROR] Process Exit Code: 137 >>>>> [ERROR] Crashed tests: >>>>> [ERROR] >>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444) >>>>> [ERROR] at >>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420) >>>>> [ERROR] at >>>>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) >>>>> [ERROR] at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>>> [ERROR] at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>>> [ERROR] at java.base/java.lang.Thread.run(Thread.java:834) >>>>> [ERROR] -> [Help 1] >>>>> [ERROR] >>>>> [ERROR] To see the full stack trace of the errors, re-run Maven with >>>>> the -e switch. >>>>> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >>>>> [ERROR] >>>>> [ERROR] For more information about the errors and possible solutions, >>>>> please read the following articles: >>>>> [ERROR] [Help 1] >>>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException >>>>> [ERROR] >>>>> [ERROR] After correcting the problems, you can resume the build with >>>>> the command >>>>> [ERROR] mvn <goals> -rf :flink-tests >>>>> >>>>> The jvmdump-files look like this: >>>>> >>>>> # Created at 2020-10-19T18:14:22.869 >>>>> java.io.IOException: Stream closed >>>>> at >>>>> java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176) >>>>> at >>>>> java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289) >>>>> at >>>>> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351) >>>>> at >>>>> java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) >>>>> at >>>>> java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) >>>>> at >>>>> java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) >>>>> at >>>>> java.base/java.io.InputStreamReader.read(InputStreamReader.java:185) >>>>> at java.base/java.io.Reader.read(Reader.java:189) >>>>> at java.base/java.util.Scanner.readInput(Scanner.java:882) >>>>> at >>>>> java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796) >>>>> at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610) >>>>> at >>>>> org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354) >>>>> at >>>>> org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190) >>>>> at >>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123) >>>>> at >>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) >>>>> at >>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) >>>>> at >>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) >>>>> at >>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) >>>>> at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>>> at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>>> at java.base/java.lang.Thread.run(Thread.java:834) >>>>> >>>>> >>>>> # Created at 2020-10-19T18:14:22.870 >>>>> System.exit() or native command error interrupted process checker. >>>>> java.lang.IllegalStateException: error [STOPPED] to read process 898133 >>>>> at >>>>> org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145) >>>>> at >>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124) >>>>> at >>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) >>>>> at >>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) >>>>> at >>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) >>>>> at >>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) >>>>> at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>>> at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>>> at java.base/java.lang.Thread.run(Thread.java:834) >>>>> >>>>> >>>>> I found some JIRA tickets with " The forked VM terminated without >>>>> properly saying goodbye": >>>>> >>>>> https://issues.apache.org/jira/browse/FLINK-18375 >>>>> https://issues.apache.org/jira/browse/FLINK-2466 >>>>> >>>>> I don't see how these could explain the issue I'm witnessing.... >>>>> >>>>> I wonder if the issue is related to the VM running "too hot". 'top' >>>>> shows very high load averages. >>>>> >>>>> The crash can be reproduced. >>>>> >>>>> Regards, >>>>> Juha >>>>> >>>>>