On 1 June 2018 at 13:28, Sergio Lopez <s...@redhat.com> wrote: > On Fri, Jun 01, 2018 at 02:16:59PM +0200, Paolo Bonzini wrote: >> On 01/06/2018 14:05, Sergio Lopez wrote: >> >>> Instead of adding explicit handling of EPIPE, shouldn't the code be >> >>> rewritten to treat -1 return && errno != EAGAIN as fatal? >> >> Yes, exactly this code is already broken for every single errno >> >> value, not simply EPIPE. It needs fixing to treat '-1' return code >> >> correctly instead of retrying everything. >> > Given that EAGAIN is already taken care of in >> > chardev/char.c:qemu_chr_write_buffer, in which cases should we retry? Or >> > should we just drop all the tsr_retry logic? >> >> It's not handled there, the call from qemu_chr_fe_write has write_all == >> false. > > You're right. So should we just retry only for -1 && errno == EAGAIN and > just ignore the error otherwise?
I don't think we should make all of qemu_chr_fe_write()'s callers have to handle EAGAIN. That function already has a way to tell the caller that it didn't consume any data, which is to return 0. (EAGAIN is a curse and we should strive to avoid it spreading its poison into areas of the codebase that we can keep it out of.) In general there are three cases I think qemu_chr_fe_write() callers might care about: * data was consumed (return >0) * data wasn't consumed, try again later (return 0) * data wasn't consumed, and a later attempt won't work either (return -1) NB that hw/char/serial.c is not the only serial device using this pattern and probably needing adjustment (though I think it's the only one with the complicated tsr_retry logic). In the patch that started this thread the report is that we use lots of CPU. Can you explain why that happens? I was expecting that we would set up the qemu_chr_fe_add_watch() on the dead chr FE and it would just never fire since the FE can never accept further data... thanks -- PMM