Cool!

Best,
tison.


Lari Hotari <lhot...@apache.org> 于2023年6月28日周三 23:50写道:

> The root cause appears to be different than the geoip database download in
> Elastic.
> By default, Elastic will stop writes when the disk usage goes over 90%.
> I've now added a setting to disable the disk usage threshold in the PR [1].
> A similar setting is applied in elastic-github-actions [2].
> Once the build passes for the PR [3], I'll proceed with merging it to
> unblock Pulsar CI.
>
> -Lari
>
> [1] -
> https://github.com/lhotari/pulsar/commit/d959eb4929d4192fb56c140a8b590e0ba25d866b
> [2] -
> https://github.com/elastic/elastic-github-actions/blob/562b8b6ae4677da97273ff6bc4d630ce96ecbaa5/elasticsearch/run-elasticsearch.sh#L41
> [3] - https://github.com/apache/pulsar/pull/20671
>
> On 2023/06/28 13:05:30 tison wrote:
> > > I guess nobody proceeded in disabling the test.
> >
> > Yeah. I'm not in a hurry but bring up the case. It seems no one is
> blocked
> > urgently and we have time to investigate it :D
> >
> > Thanks for your investigation and patch! Indeed.
> >
> > Best,
> > tison.
> >
> >
> > Lari Hotari <lhot...@apache.org> 于2023年6月28日周三 20:58写道:
> >
> > > I guess nobody proceeded in disabling the test.
> > >
> > > I have investigated the problem and written a short guide about
> > > investigating integration tests
> > > in the real GitHub Actions VM environment using ssh.
> > > This guide is a comment on the issue:
> > > https://github.com/apache/pulsar/issues/20661#issuecomment-1611216464
> > >
> > > While investigating the failing test, the test started suddenly passing
> > > and I couldn't reproduce the issue so I didn't catch the problem yet.
> This
> > > also means that the problem is transient.
> > >
> > > I suspect that it's the geoip database download that Elastic container
> > > does at startup time which is causing issues. There's also an elastic
> issue
> > > #92335 about the default geoip download [1]. This can be disabled by
> > > setting `ingest.geoip.downloader.enabled` to `false` in the container
> > > environment.
> > >
> > > geoip download might not be the root cause, but I'm now testing a
> change
> > > that disables the geoip database download and enables logging for
> Elastic
> > > container stdout and stderr output.
> > >
> > > The PR is https://github.com/apache/pulsar/pull/20671 .
> > >
> > > -Lari
> > >
> > > [1] https://github.com/elastic/elasticsearch/pull/92335
> > >
> > > On 2023/06/28 01:52:14 tison wrote:
> > > > See also https://github.com/apache/pulsar/issues/20661
> > > >
> > > > Enrico and I both verified that it works well locally, so that can
> be an
> > > > env issue or unstable dependency - I checked the ES image not
> changed,
> > > > though.
> > > >
> > > > If we cannot locate the cause quickly, perhaps disable the test to
> > > unblock
> > > > other PRs first?
> > > >
> > > > I tried to read the code, but there is no trivial cause (even the
> test
> > > > passed locally). The log indicates that statistics received one
> message
> > > > instead of 20 expected, but as other test cases passed, it may not
> be a
> > > > kernel logic issue.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > >
> >
>

Reply via email to