čt 4. 4. 2024 v 15:37 odesílatel Jakub Kicinski <k...@kernel.org> napsal:
>
> On Thu, 4 Apr 2024 07:42:45 +0200 Jaroslav Pulchart wrote:
> > We do not have much progress
>
> Random thought - do you have KFENCE enabled?
> It's sufficiently low overhead to run in production and maybe it could
> help catch the bug? You also hit some inexplicable bug in the Intel
> driver, IIRC, there may be something odd going on.. (it's not all
> happening on a single machine, right?)

We have KFENCE enabled.

Issue was observed at multiple servers. It is not a problem to reproduce it
everywhere where we deploy Loki service. The trigger is: I click
once/twice "run query" (LogQL) button by Grafana UI. the Loki is
starting to load data from the minio cluster at a speed of ~2GB/s and
almost immediately it crashes.

The Intel ICE driver is in my suspicion as well, it will not be for
the first time when we are hitting some bugs there. I will try one
testing server where we have different NIC vendor later.

Reply via email to