https://github.com/apache/pulsar/pull/20671 is now merged.

For existing PRs that are blocked, it is necessary to push new changes to the 
PR or
close and re-open the PR to pick up the fix that is merged to the master branch.

I have also cherry-picked the fix to branch-3.0 and branch-2.11.
There's an open PR for a backport to branch-2.10:
https://github.com/apache/pulsar/pull/20676

-Lari


On 2023/06/28 15:50:12 Lari Hotari wrote:
> The root cause appears to be different than the geoip database download in 
> Elastic.
> By default, Elastic will stop writes when the disk usage goes over 90%. I've 
> now added a setting to disable the disk usage threshold in the PR [1].
> A similar setting is applied in elastic-github-actions [2].
> Once the build passes for the PR [3], I'll proceed with merging it to unblock 
> Pulsar CI.
> 
> -Lari
> 
> [1] - 
> https://github.com/lhotari/pulsar/commit/d959eb4929d4192fb56c140a8b590e0ba25d866b
> [2] - 
> https://github.com/elastic/elastic-github-actions/blob/562b8b6ae4677da97273ff6bc4d630ce96ecbaa5/elasticsearch/run-elasticsearch.sh#L41
> [3] - https://github.com/apache/pulsar/pull/20671
> 
> On 2023/06/28 13:05:30 tison wrote:
> > > I guess nobody proceeded in disabling the test.
> > 
> > Yeah. I'm not in a hurry but bring up the case. It seems no one is blocked
> > urgently and we have time to investigate it :D
> > 
> > Thanks for your investigation and patch! Indeed.
> > 
> > Best,
> > tison.
> > 
> > 
> > Lari Hotari <lhot...@apache.org> 于2023年6月28日周三 20:58写道:
> > 
> > > I guess nobody proceeded in disabling the test.
> > >
> > > I have investigated the problem and written a short guide about
> > > investigating integration tests
> > > in the real GitHub Actions VM environment using ssh.
> > > This guide is a comment on the issue:
> > > https://github.com/apache/pulsar/issues/20661#issuecomment-1611216464
> > >
> > > While investigating the failing test, the test started suddenly passing
> > > and I couldn't reproduce the issue so I didn't catch the problem yet. This
> > > also means that the problem is transient.
> > >
> > > I suspect that it's the geoip database download that Elastic container
> > > does at startup time which is causing issues. There's also an elastic 
> > > issue
> > > #92335 about the default geoip download [1]. This can be disabled by
> > > setting `ingest.geoip.downloader.enabled` to `false` in the container
> > > environment.
> > >
> > > geoip download might not be the root cause, but I'm now testing a change
> > > that disables the geoip database download and enables logging for Elastic
> > > container stdout and stderr output.
> > >
> > > The PR is https://github.com/apache/pulsar/pull/20671 .
> > >
> > > -Lari
> > >
> > > [1] https://github.com/elastic/elasticsearch/pull/92335
> > >
> > > On 2023/06/28 01:52:14 tison wrote:
> > > > See also https://github.com/apache/pulsar/issues/20661
> > > >
> > > > Enrico and I both verified that it works well locally, so that can be an
> > > > env issue or unstable dependency - I checked the ES image not changed,
> > > > though.
> > > >
> > > > If we cannot locate the cause quickly, perhaps disable the test to
> > > unblock
> > > > other PRs first?
> > > >
> > > > I tried to read the code, but there is no trivial cause (even the test
> > > > passed locally). The log indicates that statistics received one message
> > > > instead of 20 expected, but as other test cases passed, it may not be a
> > > > kernel logic issue.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > >
> > 
> 

Reply via email to