I guess nobody proceeded in disabling the test. I have investigated the problem and written a short guide about investigating integration tests in the real GitHub Actions VM environment using ssh. This guide is a comment on the issue: https://github.com/apache/pulsar/issues/20661#issuecomment-1611216464
While investigating the failing test, the test started suddenly passing and I couldn't reproduce the issue so I didn't catch the problem yet. This also means that the problem is transient. I suspect that it's the geoip database download that Elastic container does at startup time which is causing issues. There's also an elastic issue #92335 about the default geoip download [1]. This can be disabled by setting `ingest.geoip.downloader.enabled` to `false` in the container environment. geoip download might not be the root cause, but I'm now testing a change that disables the geoip database download and enables logging for Elastic container stdout and stderr output. The PR is https://github.com/apache/pulsar/pull/20671 . -Lari [1] https://github.com/elastic/elasticsearch/pull/92335 On 2023/06/28 01:52:14 tison wrote: > See also https://github.com/apache/pulsar/issues/20661 > > Enrico and I both verified that it works well locally, so that can be an > env issue or unstable dependency - I checked the ES image not changed, > though. > > If we cannot locate the cause quickly, perhaps disable the test to unblock > other PRs first? > > I tried to read the code, but there is no trivial cause (even the test > passed locally). The log indicates that statistics received one message > instead of 20 expected, but as other test cases passed, it may not be a > kernel logic issue. > > Best, > tison. >