l...@gnu.org (Ludovic Courtès) writes:
> Mike Gran skribis:
>
>> It would be a trivial function to write, of course, but there is a
>> c-strcasecmp func in gnulib.
>
> Yes, better use that one.
>
> (Just add ‘c-strcase’ in m4/gnulib-cache.m4, run ‘gnulib-tool --update’
> with Gnulib v0.0-7865-ga8
Mike Gran skribis:
+ /* If the specified encoding is UTF-16 or UTF-32, then make
+ that more precise by deciding what endianness to use. */
+ if (strcasecmp (pt->encoding, "UTF-16") == 0)
+ precise_encoding = decide_utf16_encoding (port, mode);
>>
>>> + /* If the specified encoding is UTF-16 or UTF-32, then make
>>> + that more precise by deciding what endianness to use. */
>>> + if (strcasecmp (pt->encoding, "UTF-16") == 0)
>>> + precise_encoding = decide_utf16_encoding (port, mode);
>>> + else if (strcas
Hi Andy,
Andy Wingo writes:
> On Wed 03 Apr 2013 22:33, Mark H Weaver writes:
>
>> + /* If we just read a BOM in an encoding that recognizes them,
>> + then silently consume it and read another code point. */
>> + if (SCM_UNLIKELY (*codepoint == SCM_UNICODE_BOM
>>
Hi. The following review applies to the wrong version of this patch.
I'll go ahead and post it anyway.
On Wed 03 Apr 2013 22:33, Mark H Weaver writes:
> + /* If we just read a BOM in an encoding that recognizes them,
> + then silently consume it and read another code point.
start. Write a BOM if appropriate.
* doc/ref/api-io.texi (BOM Handling): New node.
* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
Adapt test to cope with the fact that 'set-port-encoding!' does not
immediately open the iconv descriptors.
(bv-read
tream start. Write a BOM if appropriate.
* doc/ref/api-io.texi (BOM Handling): New node.
* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
Adapt test to cope with the fact that 'set-port-encoding!' does not
immediately open the iconv descript
Hi Mark
>>> Here's the new patch. Any more suggestions?
There are a couple of lines in your doc patch that aren't quite right.
"@code{UTF-16BE}, @code{UTF-16LE}, @code{UTF-16BE}, or @code{UTF-16LE}"
I assume that two of these should be UTF-32.
Also
"This is intended to multiple logical te
e tweaks.
Thanks,
Mark
>From f849f9a3f6babd87088d39369442a7f429762cec Mon Sep 17 00:00:00 2001
From: Mark H Weaver
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).
* libguile/ports-internal.h (struct
Mark H Weaver skribis:
> l...@gnu.org (Ludovic Courtès) writes:
>> Woow, well thought out. The semantics seem good. (It’s interesting to
>> see how BOMs complicate things, but that’s life, I guess.)
>>
>> The patch looks good to me. The test suite is nice. It doesn’t seem to
>> cover all the
precise_encoding = decide_utf32_encoding (port, mode);
>
> Shouldn’t it be strcasecmp? (Actually there are other uses of strcmp
> already, but I think it’s a mistake.)
Ouch, good catch! Indeed, we already had some bugs because of this. I
pushed a fix for the existing bugs to stable
Hello, Mark!
Mark H Weaver skribis:
> * All kinds of streams are supported in a uniform way: files, pipes,
> sockets, terminals, etc.
>
> * As specified in Unicode 6.2, BOMs are only handled specially at the
> start of a stream, and only if the encoding is set to "UTF-16" or
> "UTF-32". B
.
Mark
>From d8d37d5519ca61961b70cb3051ccca2be7d4affa Mon Sep 17 00:00:00 2001
From: Mark H Weaver
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).
* libguile/ports-internal.h (struct scm_port_internal): Add new members
'at_st
re's the patch. Comments and suggestions solicited.
Mark
>From 008b89c7ba4637e2d6323f02b6b8b6284a533857 Mon Sep 17 00:00:00 2001
From: Mark H Weaver
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).
* libguile/ports-internal.h
; Date: Wed, 30 Jan 2013 10:17:25 +0100
>> Subject: [PATCH] detect and consume byte-order marks for textual ports
>>
>> * libguile/ports.h:
>> * libguile/ports.c (scm_consume_byte_order_mark): New procedure.
>>
>> * libguile/fports.c (scm_open_file): Call co
tle brain and
mailbox don’t get confused? :-)
> From 5512fe4f93e4e583ab538ae02dd98e5825252dc9 Mon Sep 17 00:00:00 2001
> From: Andy Wingo
> Date: Wed, 30 Jan 2013 10:17:25 +0100
> Subject: [PATCH] detect and consume byte-order marks for textual ports
>
> * libguile/ports.h:
> *
BOM is already in the previously specified encoding.
I will punt on this one.
>From 5512fe4f93e4e583ab538ae02dd98e5825252dc9 Mon Sep 17 00:00:00 2001
From: Andy Wingo
Date: Wed, 30 Jan 2013 10:17:25 +0100
Subject: [PATCH] detect and consume byte-order marks for textual ports
* libguile/ports.h:
* libguile/ports.
Mark H Weaver skribis:
> I wrote:
>> Having slept on this, I think I agree that 'open-input-file' should
>> auto-consume BOMs.
Good.
> So what should (open-file FILENAME "r+") do?
What about doing the same as for just “r”? I can’t think of any
reasonable scenario where this could be a problem
Andy Wingo writes:
> On Tue 29 Jan 2013 20:22, Neil Jerram writes:
>
>> (define (read-csv file-name)
>> (let ((s (utf16->string (get-bytevector-all (open-input-file file-name))
>>'little)))
>>
>> ;; Discard possible byte order mark.
>> (if (and (>= (string-lengt
On Tue 29 Jan 2013 20:22, Neil Jerram writes:
> (define (read-csv file-name)
> (let ((s (utf16->string (get-bytevector-all (open-input-file file-name))
> 'little)))
>
> ;; Discard possible byte order mark.
> (if (and (>= (string-length s) 1)
>(char=?
Mark H Weaver skribis:
>>> However, there’s no way to open a file in binary mode when using
>>> ‘open-input-file’, ‘call-with-input-file’, etc.
>>
>> We can add keyword or optional arguments of course. (Not suggesting
>> that we do so at this time though.)
>
> This has been on my TODO list for a
Andy Wingo writes:
> What do people think about this attached patch?
>
> Andy
>
>
>>From 831c3418941f2d643f91e3076ef9458f700a2c59 Mon Sep 17 00:00:00 2001
> From: Andy Wingo
> Date: Mon, 28 Jan 2013 22:41:34 +0100
> Subject: [PATCH] detect and consume byte-order ma
I wrote:
> Having slept on this, I think I agree that 'open-input-file' should
> auto-consume BOMs.
On the other hand, there's a nasty complication. Of course
(open-input-file FILENAME) is just (open-file FILENAME "r"), so the
auto-consuming logic should be in 'open-file'.
So what should (open-f
Hi,
l...@gnu.org (Ludovic Courtès) writes:
>> For textual files, it doesn’t seem unreasonable for ‘open-input-file’ to
>> consume the BOM, IMO. It’s not much different from the ‘eol-style’
>> transcoders.
Andy Wingo writes:
> I could go either way. I would prefer for open-input-file to consume
Hi,
[Ludo and Mark and I scribas]:
>>> * 'open-input-file' could perhaps auto-consume a BOM at the beginning of
>>> the stream, but *only* if the BOM is already in the encoding specified
>>> by the user (possibly via an explicit call to 'file-encoding').
>>
>> The problem is that we have no wa
Andy Wingo skribis:
[...]
>> Regarding byte-order marks, my preference is that users should explictly
>> consume BOMs if that's what they want (ideally using some convenience
>> procedure provided by Guile). Sometimes consuming the BOM is the wrong
>> thing.
On Mon 28 Jan 2013 23:20, Mike Gran writes:
> So if there is a "coding:" line in the doc, I think it
> should nullify giving precedence to a UTF-16 BOM.
OK.
Cheers,
Andy
--
http://wingolog.org/
-file' in stable-2.0. At the
> very least it should be removed from master.
I agree as well. Want to make a patch?
> Regarding byte-order marks, my preference is that users should explictly
> consume BOMs if that's what they want (ideally using some convenience
> procedure
built using Guile, and on that basis would advocate removing
the existing cleverness from 'open-input-file' in stable-2.0. At the
very least it should be removed from master.
Regarding byte-order marks, my preference is that users should explictly
consume BOMs if that's w
> What do people think about this attached patch?
>
> Andy
If you find the word
"coding" by scanning 8-bit char by 8-bit char, it can't
be UTF-16, since that would be more like
"c o d i n g :" with nulls interspersed.
While rather unlikely, it is a theoretical possibility
that a doc in encoding
What do people think about this attached patch?
Andy
>From 831c3418941f2d643f91e3076ef9458f700a2c59 Mon Sep 17 00:00:00 2001
From: Andy Wingo
Date: Mon, 28 Jan 2013 22:41:34 +0100
Subject: [PATCH] detect and consume byte-order marks for textual ports
* libguile/read.c (scm_i_scan_for_encod
31 matches
Mail list logo