I guess nobody proceeded in disabling the test.

I have investigated the problem and written a short guide about investigating 
integration tests
in the real GitHub Actions VM environment using ssh.
This guide is a comment on the issue:
https://github.com/apache/pulsar/issues/20661#issuecomment-1611216464

While investigating the failing test, the test started suddenly passing and I 
couldn't reproduce the issue so I didn't catch the problem yet. This also means 
that the problem is transient.

I suspect that it's the geoip database download that Elastic container does at 
startup time which is causing issues. There's also an elastic issue #92335 
about the default geoip download [1]. This can be disabled by setting 
`ingest.geoip.downloader.enabled` to `false` in the container environment.

geoip download might not be the root cause, but I'm now testing a change that 
disables the geoip database download and enables logging for Elastic container 
stdout and stderr output.

The PR is https://github.com/apache/pulsar/pull/20671 .

-Lari

[1] https://github.com/elastic/elasticsearch/pull/92335

On 2023/06/28 01:52:14 tison wrote:
> See also https://github.com/apache/pulsar/issues/20661
> 
> Enrico and I both verified that it works well locally, so that can be an
> env issue or unstable dependency - I checked the ES image not changed,
> though.
> 
> If we cannot locate the cause quickly, perhaps disable the test to unblock
> other PRs first?
> 
> I tried to read the code, but there is no trivial cause (even the test
> passed locally). The log indicates that statistics received one message
> instead of 20 expected, but as other test cases passed, it may not be a
> kernel logic issue.
> 
> Best,
> tison.
> 

Reply via email to