[email protected] (Ludovic Courtès) writes:
> Andy Wingo <[email protected]> skribis:
>
>> The (newline) function can write CRLF
>> The ~% format directive should DTRT
>> read-line should DTRT
>
> IMO the correct abstraction here is transcoders à la R6RS.
Agreed.
> The problem is that scm_t_port doesn’t have any slot to specify the
> EOL style, but it would need one.
I think it's important that we find a way to add new information to
scm_t_port in 2.0. We also need this to properly fix the BOM issue.
Here's a proposal: let's slightly redefine the meaning of 'input_cd' and
'output_cd'. Users are already unable to use these, because in the
common case (UTF-8) they are both -1.
Instead of having 'input_cd' and 'output_cd' point directly to the
platform's iconv_t structures, let's have them point to our own internal
structure(s) that hold the needed transcoder state. This could include
things like the state for internally-implement encoding(s) (e.g. UTF-8
BOM handling), EOL style, and iconv_t pointer(s) if appropriate.
What do you think?
Mark