On 21/11/2023 12.47, Marc-André Lureau wrote:
Hi
On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <th...@redhat.com> wrote:
On 21/11/2023 10.39, Marc-André Lureau wrote:
Hi
On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npig...@gmail.com> wrote:
On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
Hi
On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npig...@gmail.com> wrote:
If a chardev socket is not read, it will eventually fill and QEMU
can block attempting to write to it. A difficult bug in avocado
tests where the console socket was not being read from caused this
hang.
warn if a chardev write is blocked for 100ms.
Signed-off-by: Nicholas Piggin <npig...@gmail.com>
---
This is not necessary for the fix but it does trigger in the
failing avocado test without the previous patch applied. Maybe
it would be helpful?
Thanks,
Nick
chardev/char.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/chardev/char.c b/chardev/char.c
index 996a024c7a..7c375e3cc4 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
{
ChardevClass *cc = CHARDEV_GET_CLASS(s);
int res = 0;
+ int nr_retries = 0;
+
*offset = 0;
qemu_mutex_lock(&s->chr_write_lock);
@@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
} else {
g_usleep(100);
}
+ if (++nr_retries == 1000) { /* 100ms */
+ warn_report("Chardev '%s' write blocked for > 100ms, "
+ "socket buffer full?", s->label);
+ }
That shouldn't happen, the frontend should poll and only write when it
can. What is the qemu command being used here?
You can follow it through the thread here
https://lore.kernel.org/qemu-devel/zvt-by9yor69q...@redhat.com/
In short, a console device is attached to a socket pair and nothing
ever reads from it. It eventually fills, and writing to it fails
indefinitely here.
It can be reproduced with:
make check-avocado
AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
How reliably? I tried 10/10.
It used to fail for me every time I tried - but the fix has already been
merged yesterday (commit cd43f00524070c026), so if you updated today, you'll
see the test passing again.
Ok so the "frontend" is spapr-vty and there:
void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
{
SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
/* XXX this blocks entire thread. Rewrite to use
* qemu_chr_fe_write and background I/O callbacks */
qemu_chr_fe_write_all(&dev->chardev, buf, len);
}
(grep "XXX this blocks", we have a lot...)
Can H_PUT_TERM_CHAR return the number of bytes written?
You can find the definition of the hypercall in the LoPAPR spec:
https://elinux.org/images/a/a4/LoPAPR_DRAFT_v11_24March2016.pdf
... and if I get it right, it does not have a way to tell the guest the
amount of accepted characters. But it could return H_BUSY if it is not able
to enqueue all characters at once. As far as I can see, this will make the
guest spin until it can finally send out the characters... not sure whether
that's so much better...?
Thomas