bug#35350: Some compile output still leaks through with --verbosity=1

Mark H Weaver Tue, 30 Apr 2019 13:45:28 -0700

Hi Ludovic,

Ludovic Courtès <l...@gnu.org> writes:

> Mark H Weaver <m...@netris.org> skribis:
>
>> Ludovic Courtès <l...@gnu.org> writes:
>>
>>> The third read(2) call here ends on a partial UTF-8 sequence for LEFT
>>> SINGLE QUOTATION MARK (we get the first two bytes of a three byte
>>> sequence.)
>>>
>>> What happens is that ‘process-stderr’ in (guix store) gets that byte
>>> string from the daemon, passes it through ‘read-maybe-utf8-string’,
>>> which replaces the last two bytes with REPLACEMENT CHARACTER, which is
>>> itself a 3-byte sequence.
>>
>> It seems to me that what's needed here is to save the UTF-8 decoder
>> state between calls to 'process-stderr'.
>
> So there are two things.  To fix the issue you reported (build output
> that goes through), I think we must simply turn off UTF-8 decoding from
> ‘process-stderr’ and leave that entirely to ‘build-event-output-port’.

Can we assume that UTF-8 is the appropriate encoding for
(current-build-output-port)?  My interpretation of the Guix manual entry
for 'current-build-output-port' suggests that the answer should be "no".

Also, in your previous message you wrote:

  The problem is the first layer of UTF-8 decoding that happens in
  ‘process-stderr’, in the ‘%stderr-next’ case.  We would need to
  disable it, but only if the build output port is
  ‘build-event-output-port’ (i.e., it’s capable of interpreting
  “multiplexed build output” correctly.)

It sounds like you're suggesting that 'process-stderr' should look to
see if (current-build-output-port) is a 'build-event-output-port', and
in that case it should use binary I/O primitives to write raw binary
data to it, otherwise it should use text I/O primitives and write
characters to it.  Do I understand correctly?

IMO, it would be cleaner to treat 'build-event-output-port' uniformly,
and specifically as a textual port of unknown encoding.

What do you think?

> However, ‘build-event-output-port’ would still fail to properly decode
> split UTF-8 sequences, and for that we’d need to preserve decoder
> state as you describe.

I would suggest changing 'build-event-output-port' to create an R6RS
custom *textual* output port, so that it wouldn't have to worry about
encodings at all, and it would only be given whole characters.
Internally, it would be doing exactly what you suggest above, but those
details would be encapsulated within the custom textual port.

However, I don't think we can use Guile's current implementation of R6RS
custom textual output ports, which are currently built on Guile's legacy
soft ports, which I suspect have a similar bug with multibyte characters
sometimes being split (see 'soft_port_write' in vports.c).

Having said all of this, my suggestions would ultimately entail having
two separate places along the stderr pipeline where 'utf8->string!'
would be used, and maybe that's too much until we have a more optimized
C implementation of it.

Thoughts?

      Mark

bug#35350: Some compile output still leaks through with --verbosity=1

Reply via email to