Cool! Best, tison.
Lari Hotari <lhot...@apache.org> 于2023年6月28日周三 23:50写道: > The root cause appears to be different than the geoip database download in > Elastic. > By default, Elastic will stop writes when the disk usage goes over 90%. > I've now added a setting to disable the disk usage threshold in the PR [1]. > A similar setting is applied in elastic-github-actions [2]. > Once the build passes for the PR [3], I'll proceed with merging it to > unblock Pulsar CI. > > -Lari > > [1] - > https://github.com/lhotari/pulsar/commit/d959eb4929d4192fb56c140a8b590e0ba25d866b > [2] - > https://github.com/elastic/elastic-github-actions/blob/562b8b6ae4677da97273ff6bc4d630ce96ecbaa5/elasticsearch/run-elasticsearch.sh#L41 > [3] - https://github.com/apache/pulsar/pull/20671 > > On 2023/06/28 13:05:30 tison wrote: > > > I guess nobody proceeded in disabling the test. > > > > Yeah. I'm not in a hurry but bring up the case. It seems no one is > blocked > > urgently and we have time to investigate it :D > > > > Thanks for your investigation and patch! Indeed. > > > > Best, > > tison. > > > > > > Lari Hotari <lhot...@apache.org> 于2023年6月28日周三 20:58写道: > > > > > I guess nobody proceeded in disabling the test. > > > > > > I have investigated the problem and written a short guide about > > > investigating integration tests > > > in the real GitHub Actions VM environment using ssh. > > > This guide is a comment on the issue: > > > https://github.com/apache/pulsar/issues/20661#issuecomment-1611216464 > > > > > > While investigating the failing test, the test started suddenly passing > > > and I couldn't reproduce the issue so I didn't catch the problem yet. > This > > > also means that the problem is transient. > > > > > > I suspect that it's the geoip database download that Elastic container > > > does at startup time which is causing issues. There's also an elastic > issue > > > #92335 about the default geoip download [1]. This can be disabled by > > > setting `ingest.geoip.downloader.enabled` to `false` in the container > > > environment. > > > > > > geoip download might not be the root cause, but I'm now testing a > change > > > that disables the geoip database download and enables logging for > Elastic > > > container stdout and stderr output. > > > > > > The PR is https://github.com/apache/pulsar/pull/20671 . > > > > > > -Lari > > > > > > [1] https://github.com/elastic/elasticsearch/pull/92335 > > > > > > On 2023/06/28 01:52:14 tison wrote: > > > > See also https://github.com/apache/pulsar/issues/20661 > > > > > > > > Enrico and I both verified that it works well locally, so that can > be an > > > > env issue or unstable dependency - I checked the ES image not > changed, > > > > though. > > > > > > > > If we cannot locate the cause quickly, perhaps disable the test to > > > unblock > > > > other PRs first? > > > > > > > > I tried to read the code, but there is no trivial cause (even the > test > > > > passed locally). The log indicates that statistics received one > message > > > > instead of 20 expected, but as other test cases passed, it may not > be a > > > > kernel logic issue. > > > > > > > > Best, > > > > tison. > > > > > > > > > >