date:20130403

Re: [PATCH] Move slow path out of 'scm_get_byte_or_eof' et al

2013-04-03 Thread Ludovic Courtès

Mark H Weaver  skribis:

> Fair enough.  First of all, this patch reduces the size of the built
> libguile.so by about 42 kilobytes, and libguile.a by about 96 kilobytes.
>
> As for performance: the updated patches (attached below) slows things
> down by about one quarter of one percent on my machine.  The specific
> benchmark I did was to call 'read-string' on an 11 megabyte ASCII text
> file, with the port-encoding set to UTF-8.  In this case, all the reads
> are done using 'scm_get_byte_or_eof' (called from 'get_utf8_codepoint').

OK.  Interesting.

> I've attached the new patches.  Okay to push now?

Yes!

Thanks for the analysis,
Ludo’.

[PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Mark H Weaver

Hello all,

I've attached a proposed patch to improve our handling of BOMs.
Here are a few notable aspects:

* All kinds of streams are supported in a uniform way: files, pipes,
  sockets, terminals, etc.

* As specified in Unicode 6.2, BOMs are only handled specially at the
  start of a stream, and only if the encoding is set to "UTF-16" or
  "UTF-32".  BOMs are *not* handled specially if the encoding is set to
  "UTF-16LE", etc.

* This code never tries to read a BOM until the user has asked to read.
  If the user writes before reading, it chooses big-endian and writes a
  BOM if appropriate (if the encoding is set to "UTF-16" or "UTF-32").

* The encodings "UTF-16" and "UTF-32" are *never* passed to iconv,
  because BOM handling varies between iconv implementations.  Creation
  of the iconv descriptors is always postponed until the first read or
  write, at which point a decision is made about the endianness, and
  then "UTF-16BE", "UTF-16LE", "UTF-32BE", or "UTF-32LE" is passed to
  iconv.

* If 'rw_random' is zero, then the input and output streams are
  considered independent: the first read will consume a BOM if
  appropriate, *and* the first write will produce a BOM if appropriate.

* If 'rw_random' is non-zero, then the input and output streams are
  considered linked: if the user reads first, then a BOM will be
  consumed if appropriate, but later writes will *not* produce a BOM.
  Similarly, if the user writes first, then later reads will *not*
  consume a BOM.

* If 'set-port-encoding!' is called in the middle of a stream, it treats
  it as a new logical "start of stream", i.e. if the encoding is set to
  "UTF-16" or "UTF-32" then a BOM will be consumed the next time you
  read and/or produced the next time you write.

* Seeks to the beginning of the file set the "start of stream" flags.
  Seeks anywhere else clear the "start of stream" flags.

Okay, here's the patch.  Comments and suggestions solicited.

 Mark


>From 008b89c7ba4637e2d6323f02b6b8b6284a533857 Mon Sep 17 00:00:00 2001
From: Mark H Weaver 
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).

* libguile/ports-internal.h (struct scm_port_internal): Add new members
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (SCM_UNICODE_BOM): New macro.
  (scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.

* libguile/ports.c (scm_new_port_table_entry): Initialize
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (get_iconv_codepoint): Pass new 'mode' parameter to
  'scm_i_port_iconv_descriptors'.
  (get_codepoint): After reading a codepoint at stream start, record
  that we're no longer at stream start, and consume a BOM where
  appropriate.
  (scm_seek): Set the stream start flags according to the new position.
  (looking_at_bytes): New static function.
  (scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
  scm_utf32le_bom): New static const arrays.
  (decide_utf16_encoding, decide_utf32_encoding): New static functions.
  (scm_i_port_iconv_descriptors): Add new 'mode' parameter.  If the
  specified encoding is UTF-16 or UTF-32, make that precise by deciding
  what endianness to use, and construct iconv descriptors based on the
  precise encoding.
  (scm_i_set_port_encoding_x): Record that we are now at stream start.
  Do not open the new iconv descriptors immediately; let them be
  initialized lazily.

* libguile/print.c (display_string_using_iconv): Record that we're no
  longer at stream start.  Write a BOM if appropriate.

* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
  Adapt test to cope with the fact that 'set-port-encoding!' does not
  immediately open the iconv descriptors.
  (bv-read-test): New procedure.
  ("unicode byte-order marks (BOMs)"): New test prefix.
---
 libguile/ports-internal.h   |7 +-
 libguile/ports.c|  134 +++---
 libguile/print.c|   18 ++-
 test-suite/tests/ports.test |  259 ++-
 4 files changed, 399 insertions(+), 19 deletions(-)

diff --git a/libguile/ports-internal.h b/libguile/ports-internal.h
index 73a788f..cd1746b 100644
--- a/libguile/ports-internal.h
+++ b/libguile/ports-internal.h
@@ -48,14 +48,19 @@ struct scm_port_internal
 {
   scm_t_port_encoding_mode encoding_mode;
   scm_t_iconv_descriptors *iconv_descriptors;
+  int at_stream_start_for_bom_read;
+  int at_stream_start_for_bom_write;
   SCM alist;
 };
 
 typedef struct scm_port_internal scm_t_port_internal;
 
+#define SCM_UNICODE_BOM  0xFEFF  /* Unicode byte-order mark */
+
 #define SCM_PORT_GET_INTERNAL(x)\
   ((scm_t_port_internal *) (SCM_PTAB_ENTRY(x)->input_cd))
 
-SCM_INTERNAL scm_t_iconv_descriptors *scm_i_port_iconv_descriptors (SCM port);
+SCM_INTERNAL scm_t_iconv_descriptors *
+scm_i_port_iconv_descriptors (SCM port, scm_t_port_rw_active mode);
 
 #endif
diff --git a/libguil

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Mark H Weaver

Here's an improved version of the patch.  Mainly it adds more tests.
Also, I forgot to mention that binary I/O does not affect the "start of
stream" flags at all.  This is mainly for efficiency reasons, but even
so, I don't feel too badly about it.

 Mark


>From d8d37d5519ca61961b70cb3051ccca2be7d4affa Mon Sep 17 00:00:00 2001
From: Mark H Weaver 
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).

* libguile/ports-internal.h (struct scm_port_internal): Add new members
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (SCM_UNICODE_BOM): New macro.
  (scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.

* libguile/ports.c (scm_new_port_table_entry): Initialize
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (get_iconv_codepoint): Pass new 'mode' parameter to
  'scm_i_port_iconv_descriptors'.
  (get_codepoint): After reading a codepoint at stream start, record
  that we're no longer at stream start, and consume a BOM where
  appropriate.
  (scm_seek): Set the stream start flags according to the new position.
  (looking_at_bytes): New static function.
  (scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
  scm_utf32le_bom): New static const arrays.
  (decide_utf16_encoding, decide_utf32_encoding): New static functions.
  (scm_i_port_iconv_descriptors): Add new 'mode' parameter.  If the
  specified encoding is UTF-16 or UTF-32, make that precise by deciding
  what endianness to use, and construct iconv descriptors based on the
  precise encoding.
  (scm_i_set_port_encoding_x): Record that we are now at stream start.
  Do not open the new iconv descriptors immediately; let them be
  initialized lazily.

* libguile/print.c (display_string_using_iconv): Record that we're no
  longer at stream start.  Write a BOM if appropriate.

* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
  Adapt test to cope with the fact that 'set-port-encoding!' does not
  immediately open the iconv descriptors.
  (bv-read-test): New procedure.
  ("unicode byte-order marks (BOMs)"): New test prefix.
---
 libguile/ports-internal.h   |7 +-
 libguile/ports.c|  134 +---
 libguile/print.c|   18 ++-
 test-suite/tests/ports.test |  293 ++-
 4 files changed, 433 insertions(+), 19 deletions(-)

diff --git a/libguile/ports-internal.h b/libguile/ports-internal.h
index 73a788f..cd1746b 100644
--- a/libguile/ports-internal.h
+++ b/libguile/ports-internal.h
@@ -48,14 +48,19 @@ struct scm_port_internal
 {
   scm_t_port_encoding_mode encoding_mode;
   scm_t_iconv_descriptors *iconv_descriptors;
+  int at_stream_start_for_bom_read;
+  int at_stream_start_for_bom_write;
   SCM alist;
 };
 
 typedef struct scm_port_internal scm_t_port_internal;
 
+#define SCM_UNICODE_BOM  0xFEFF  /* Unicode byte-order mark */
+
 #define SCM_PORT_GET_INTERNAL(x)\
   ((scm_t_port_internal *) (SCM_PTAB_ENTRY(x)->input_cd))
 
-SCM_INTERNAL scm_t_iconv_descriptors *scm_i_port_iconv_descriptors (SCM port);
+SCM_INTERNAL scm_t_iconv_descriptors *
+scm_i_port_iconv_descriptors (SCM port, scm_t_port_rw_active mode);
 
 #endif
diff --git a/libguile/ports.c b/libguile/ports.c
index 51145e6..99261da 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -639,6 +639,9 @@ scm_new_port_table_entry (scm_t_bits tag)
 pti->encoding_mode = SCM_PORT_ENCODING_MODE_ICONV;
   pti->iconv_descriptors = NULL;
 
+  pti->at_stream_start_for_bom_read  = 1;
+  pti->at_stream_start_for_bom_write = 1;
+
   /* XXX These fields are not what they seem.  They have been
  repurposed, but cannot safely be renamed in 2.0 without breaking
  ABI compatibility.  This will be cleaned up in 2.2.  */
@@ -1306,10 +1309,12 @@ static int
 get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
 		 char buf[SCM_MBCHAR_BUF_SIZE], size_t *len)
 {
-  scm_t_iconv_descriptors *id = scm_i_port_iconv_descriptors (port);
+  scm_t_iconv_descriptors *id;
   scm_t_uint8 utf8_buf[SCM_MBCHAR_BUF_SIZE];
   size_t input_size = 0;
 
+  id = scm_i_port_iconv_descriptors (port, SCM_PORT_READ);
+
   for (;;)
 {
   int byte_read;
@@ -1393,7 +1398,24 @@ get_codepoint (SCM port, scm_t_wchar *codepoint,
 err = get_iconv_codepoint (port, codepoint, buf, len);
 
   if (SCM_LIKELY (err == 0))
-update_port_lf (*codepoint, port);
+{
+  if (SCM_UNLIKELY (pti->at_stream_start_for_bom_read))
+{
+  /* Record that we're no longer at stream start. */
+  pti->at_stream_start_for_bom_read = 0;
+  if (pt->rw_random)
+pti->at_stream_start_for_bom_write = 0;
+
+  /* If we just read a BOM in an encoding that recognizes them,
+ then silently consume it and read another code point. */
+  if (SCM_UNLIKELY (*codepoint == SCM_UNICODE_BOM
+&& (strcmp(pt->encoding

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Ludovic Courtès

Hello, Mark!

Mark H Weaver  skribis:

> * All kinds of streams are supported in a uniform way: files, pipes,
>   sockets, terminals, etc.
>
> * As specified in Unicode 6.2, BOMs are only handled specially at the
>   start of a stream, and only if the encoding is set to "UTF-16" or
>   "UTF-32".  BOMs are *not* handled specially if the encoding is set to
>   "UTF-16LE", etc.

OK.

> * This code never tries to read a BOM until the user has asked to read.
>   If the user writes before reading, it chooses big-endian and writes a
>   BOM if appropriate (if the encoding is set to "UTF-16" or "UTF-32").
>
> * The encodings "UTF-16" and "UTF-32" are *never* passed to iconv,
>   because BOM handling varies between iconv implementations.  Creation
>   of the iconv descriptors is always postponed until the first read or
>   write, at which point a decision is made about the endianness, and
>   then "UTF-16BE", "UTF-16LE", "UTF-32BE", or "UTF-32LE" is passed to
>   iconv.
>
> * If 'rw_random' is zero, then the input and output streams are
>   considered independent: the first read will consume a BOM if
>   appropriate, *and* the first write will produce a BOM if appropriate.
>
> * If 'rw_random' is non-zero, then the input and output streams are
>   considered linked: if the user reads first, then a BOM will be
>   consumed if appropriate, but later writes will *not* produce a BOM.
>   Similarly, if the user writes first, then later reads will *not*
>   consume a BOM.
>
> * If 'set-port-encoding!' is called in the middle of a stream, it treats
>   it as a new logical "start of stream", i.e. if the encoding is set to
>   "UTF-16" or "UTF-32" then a BOM will be consumed the next time you
>   read and/or produced the next time you write.
>
> * Seeks to the beginning of the file set the "start of stream" flags.
>   Seeks anywhere else clear the "start of stream" flags.

Woow, well thought out.  The semantics seem good.  (It’s interesting to
see how BOMs complicate things, but that’s life, I guess.)

The patch looks good to me.  The test suite is nice.  It doesn’t seem to
cover all the corner cases listed above, but that can be added later on
perhaps?

Perhaps the text above could be added to the manual, in a
@ununnumberedsec or something?

Remarks:

> diff --git a/libguile/ports-internal.h b/libguile/ports-internal.h
> index 73a788f..cd1746b 100644
> --- a/libguile/ports-internal.h
> +++ b/libguile/ports-internal.h
> @@ -48,14 +48,19 @@ struct scm_port_internal
>  {
>scm_t_port_encoding_mode encoding_mode;
>scm_t_iconv_descriptors *iconv_descriptors;
> +  int at_stream_start_for_bom_read;
> +  int at_stream_start_for_bom_write;

Add “:1”?

> +#define SCM_UNICODE_BOM  0xFEFF  /* Unicode byte-order mark */

0xfeffUL to be on the safe side.

> +/* If the next LEN bytes from port are equal to those in BYTES, then

s/port/PORT/

> +   return 1, else return 0.  Leave the port position unchanged.  */
> +static int
> +looking_at_bytes (SCM port, unsigned char *bytes, int len)

const unsigned char *bytes

> +{
> +  scm_t_port *pt = SCM_PTAB_ENTRY (port);
> +  int result;
> +  int i = 0;
> +
> +  while (i < len && scm_peek_byte_or_eof (port) == bytes[i])
> +{
> +  pt->read_pos++;
> +  i++;
> +}
> +
> +  result = (i == len);
> +
> +  while (i > 0)
> +scm_unget_byte (bytes[--i], port);
> +
> +  return result;
> +}

Should it be scm_get_byte_or_eof given that scm_unget_byte is used later?

What if pt->read_buf_size == 1?  What if there’s data in saved_read_buf?

> +/* Decide what endianness to use for a UTF-16 port.  Return "UTF-16BE"
> +   or "UTF-16LE".  MODE must be either SCM_PORT_READ or SCM_PORT_WRITE,
> +   and specifies which operation is about to be done.  The MODE
> +   determines how we will decide the endianness.  We deliberately avoid
> +   reading from the port unless the user is about to do so.  If the user
> +   is about to read, then we look for a BOM, and if present, we use it
> +   to determine the endianness.  Otherwise we choose big-endian, as
> +   recommended by the Unicode Consortium.  */
> +static char *
> +decide_utf16_encoding (SCM port, scm_t_port_rw_active mode)

static const char *

> +static char *
> +decide_utf32_encoding (SCM port, scm_t_port_rw_active mode)

Likewise.

> +  /* If the specified encoding is UTF-16 or UTF-32, then make
> + that more precise by deciding what endianness to use.  */
> +  if (strcmp (pt->encoding, "UTF-16") == 0)
> +precise_encoding = decide_utf16_encoding (port, mode);
> +  else if (strcmp (pt->encoding, "UTF-32") == 0)
> +precise_encoding = decide_utf32_encoding (port, mode);

Shouldn’t it be strcasecmp?  (Actually there are other uses of strcmp
already, but I think it’s a mistake.)

> +  if (SCM_UNLIKELY (strcmp(pt->encoding, "UTF-16") == 0
> +|| strcmp(pt->encoding, "UTF-32") == 0))

Likewise, + space before paren.

Thanks!

Ludo’.

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Hi Daniel,

Daniel Llorens  skribis:

> --- a/libguile/array-map.h
> +++ b/libguile/array-map.h
> @@ -34,16 +34,6 @@ SCM_API int scm_ramapc (void *cproc, SCM data, SCM ra0, 
> SCM lra,
>  SCM_API int scm_array_fill_int (SCM ra, SCM fill, SCM ignore);
>  SCM_API SCM scm_array_fill_x (SCM ra, SCM fill);
>  SCM_API SCM scm_array_copy_x (SCM src, SCM dst);
> -SCM_API int scm_ra_eqp (SCM ra0, SCM ras);
> -SCM_API int scm_ra_lessp (SCM ra0, SCM ras);
> -SCM_API int scm_ra_leqp (SCM ra0, SCM ras);
> -SCM_API int scm_ra_grp (SCM ra0, SCM ras);
> -SCM_API int scm_ra_greqp (SCM ra0, SCM ras);
> -SCM_API int scm_ra_sum (SCM ra0, SCM ras);
> -SCM_API int scm_ra_difference (SCM ra0, SCM ras);
> -SCM_API int scm_ra_product (SCM ra0, SCM ras);
> -SCM_API int scm_ra_divide (SCM ra0, SCM ras);
> -SCM_API int scm_array_identity (SCM src, SCM dst);
>  SCM_API SCM scm_array_map_x (SCM ra0, SCM proc, SCM lra);
>  SCM_API SCM scm_array_for_each (SCM proc, SCM ra0, SCM lra);
>  SCM_API SCM scm_array_index_map_x (SCM ra, SCM proc);

This is problematic since they are part of the API.  But they’re
undocumented, and as you say, most likely dead code.

So, can you instead deprecate them?  You need to replace SCM_API with
SCM_DEPRECATED, and put them (body and declarations) in

  #if SCM_ENABLE_DEPRECATED == 1

Hopefully they can be removed in 2.2 or so.

TIA,
Ludo’.

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Daniel Llorens  skribis:

> From a38b0a98ed6093ee9ebe4ac60b4b6f9efbdcfdd5 Mon Sep 17 00:00:00 2001
> From: Daniel Llorens 
> Date: Mon, 1 Apr 2013 18:43:58 +0200
> Subject: [PATCH 2/2] Remove double indirection in element access in
>  array-copy!
>
> * libguile/array-map.c: (racp): factor scm_generalized_vector_ref,
>   scm_generalized_vector_set_x out of the rank-1 loop.

Applied, with the addition of the couple of ‘scm_array_handle_release’
calls that were missing.

Thanks,
Ludo’.

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Daniel Llorens  skribis:

> These two patches do it for array-map!.
>
> The first patch avoids cons for the 1-argument case. The second patch removes 
> the double indirection for the first two arguments in every case. Since 
> there's some work done inside the loop, the improvement is smaller than for 
> array-copy! or array-fill!.

Could you provide a patch that adds test cases for array-map! in
arrays.test?  I’d like to have a minimal safety net before applying
these.

TIA,  :-)
Ludo’.

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Daniel Llorens


On Apr 3, 2013, at 14:50, Ludovic Courtès wrote:

> Daniel Llorens  skribis:
> 
>> These two patches do it for array-map!.
>> 
>> The first patch avoids cons for the 1-argument case. The second patch 
>> removes the double indirection for the first two arguments in every case. 
>> Since there's some work done inside the loop, the improvement is smaller 
>> than for array-copy! or array-fill!.
> 
> Could you provide a patch that adds test cases for array-map! in
> arrays.test?  I’d like to have a minimal safety net before applying
> these.
> 
> TIA,  :-)
> Ludo’.

The patches pass the tests in ramap.test. These include different types and 
non-compact arrays. I can add more if you think these aren't sufficient.

Here is the new patch set on top of the one you've already applied. The patch 
for array-fill! includes a test for non-compact arrays that I didn't see in 
arrays.test. 

Regards,

Daniel



0001-Deprecate-dead-code-in-array-map.c.patch
Description: Binary data


0002-Avoid-per-element-cons-for-1-arg-case-of-array-map.patch
Description: Binary data


0003-Remove-double-indirection-in-array-map-with-2-args.patch
Description: Binary data


0004-Remove-double-indirection-for-1st-arg-of-array-for-e.patch
Description: Binary data


0005-Remove-double-indirection-in-array-fill.patch
Description: Binary data

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Daniel Llorens  skribis:

> From 3fba07976ac26b7e7def16679a7e7d1f92b2951c Mon Sep 17 00:00:00 2001
> From: Daniel Llorens 
> Date: Wed, 3 Apr 2013 15:40:48 +0200
> Subject: [PATCH 1/5] Deprecate dead code in array-map.c
>
> * libguile/array-map.c, libguile/array-map.h: deprecate scm_ra_eqp,
>   scm_ra_lessp, scm_ra_leqp, scm_ra_grp, scm_ra_greqp, scm_ra_sum,
>   scm_ra_product, scm_ra_difference, scm_ra_divide, scm_array_identity.

Applied, with the declarations enclosed in #if SCM_ENABLE_DEPRECATED.

Ludo’.

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Daniel Llorens  skribis:

> From eb4bbb3f42a4a0fcf1d51ecacd557606533d5b40 Mon Sep 17 00:00:00 2001
> From: Daniel Llorens 
> Date: Tue, 2 Apr 2013 15:23:55 +0200
> Subject: [PATCH 2/5] Avoid per-element cons for 1-arg case of array-map!
>
> * libguile/array-map.c: (ramap): special case when ras is a 1-element list.

Applied.

Indeed, ‘ramap’ is covered by the tests:

  
http://hydra.nixos.org/build/4285515/download/2/coverage/libguile/array-map.c.func.html

Ludo’.

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Daniel Llorens  skribis:

> From 78322cca895275772028bebeb79ac5038df04047 Mon Sep 17 00:00:00 2001
> From: Daniel Llorens 
> Date: Tue, 2 Apr 2013 15:53:22 +0200
> Subject: [PATCH 3/5] Remove double indirection in array-map! with <2 args
>
> * libguile/array-map.c: (ramap): factor GVSET/GVREF out of rank-1 loop
>   for ra0 and the first element of ras.

Applied.

Ludo’.

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Daniel Llorens


On Apr 3, 2013, at 19:06, Ludovic Courtès wrote:

> Daniel Llorens  skribis:
> 
>> From eb4bbb3f42a4a0fcf1d51ecacd557606533d5b40 Mon Sep 17 00:00:00 2001
>> From: Daniel Llorens 
>> Date: Tue, 2 Apr 2013 15:23:55 +0200
>> Subject: [PATCH 2/5] Avoid per-element cons for 1-arg case of array-map!
>> 
>> * libguile/array-map.c: (ramap): special case when ras is a 1-element list.
> 
> Applied.
> 
> Indeed, ‘ramap’ is covered by the tests:
> 
>  
> http://hydra.nixos.org/build/4285515/download/2/coverage/libguile/array-map.c.func.html
> 
> Ludo’.

Nifty, I didn't know about this.

I've verified that there are indeed no tests on array-copy!. I'll prepare a 
patch.

Regards,

Daniel.

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Mark H Weaver

Hi Ludovic,

Thanks for the quick review!  An improved patch is attached below.

l...@gnu.org (Ludovic Courtès) writes:
> Woow, well thought out.  The semantics seem good.  (It’s interesting to
> see how BOMs complicate things, but that’s life, I guess.)
>
> The patch looks good to me.  The test suite is nice.  It doesn’t seem to
> cover all the corner cases listed above, but that can be added later on
> perhaps?

Yes, the tests are still a work-in-progess, but I've added quite a few
more since you last looked.

> Perhaps the text above could be added to the manual,

In the attached patch, I've added a new node to the "Input and Output"
section.

> Mark H Weaver  skribis:
>> diff --git a/libguile/ports-internal.h b/libguile/ports-internal.h
>> index 73a788f..cd1746b 100644
>> --- a/libguile/ports-internal.h
>> +++ b/libguile/ports-internal.h
>> @@ -48,14 +48,19 @@ struct scm_port_internal
>>  {
>>scm_t_port_encoding_mode encoding_mode;
>>scm_t_iconv_descriptors *iconv_descriptors;
>> +  int at_stream_start_for_bom_read;
>> +  int at_stream_start_for_bom_write;
>
> Add “:1”?

Good idea.

[...more good suggestions that I've incorporated in the new patch...]

>> +{
>> +  scm_t_port *pt = SCM_PTAB_ENTRY (port);
>> +  int result;
>> +  int i = 0;
>> +
>> +  while (i < len && scm_peek_byte_or_eof (port) == bytes[i])
>> +{
>> +  pt->read_pos++;
>> +  i++;
>> +}
>> +
>> +  result = (i == len);
>> +
>> +  while (i > 0)
>> +scm_unget_byte (bytes[--i], port);
>> +
>> +  return result;
>> +}
>
> Should it be scm_get_byte_or_eof given that scm_unget_byte is used later?

Yes.  Bytes are only consumed if are equal to bytes[i], so an EOF will
never be consumed or passed to scm_unget_byte.

> What if pt->read_buf_size == 1?  What if there’s data in saved_read_buf?

All of those details are handled by 'scm_peek_byte_or_eof', which is
guaranteed to leave 'pt->read_pos' pointing at the byte that's returned
(if not EOF).  Therefore, it's always safe to increment that pointer by
one (but no more than one) after calling 'scm_peek_byte_or_eof' if it
returned non-EOF.

Look at the code for 'scm_peek_byte_or_eof' and this will be clear.
Also note that you did the same thing in 'scm_utf8_codepoint' :)

[...more good suggestions, incorporated...]

>> +  /* If the specified encoding is UTF-16 or UTF-32, then make
>> + that more precise by deciding what endianness to use.  */
>> +  if (strcmp (pt->encoding, "UTF-16") == 0)
>> +precise_encoding = decide_utf16_encoding (port, mode);
>> +  else if (strcmp (pt->encoding, "UTF-32") == 0)
>> +precise_encoding = decide_utf32_encoding (port, mode);
>
> Shouldn’t it be strcasecmp?  (Actually there are other uses of strcmp
> already, but I think it’s a mistake.)

Ouch, good catch!  Indeed, we already had some bugs because of this.  I
pushed a fix for the existing bugs to stable-2.0, and updated this patch
accordingly.

Here's the new patch.  Any more suggestions?

Thanks!
  Mark

>From c0d7228824dcaf7edcbc2de2cdef5c091ef2fc2f Mon Sep 17 00:00:00 2001
From: Mark H Weaver 
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).

* libguile/ports-internal.h (struct scm_port_internal): Add new members
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (SCM_UNICODE_BOM): New macro.
  (scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.

* libguile/ports.c (scm_new_port_table_entry): Initialize
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (get_iconv_codepoint): Pass new 'mode' parameter to
  'scm_i_port_iconv_descriptors'.
  (get_codepoint): After reading a codepoint at stream start, record
  that we're no longer at stream start, and consume a BOM where
  appropriate.
  (scm_seek): Set the stream start flags according to the new position.
  (looking_at_bytes): New static function.
  (scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
  scm_utf32le_bom): New static const arrays.
  (decide_utf16_encoding, decide_utf32_encoding): New static functions.
  (scm_i_port_iconv_descriptors): Add new 'mode' parameter.  If the
  specified encoding is UTF-16 or UTF-32, make that precise by deciding
  what endianness to use, and construct iconv descriptors based on the
  precise encoding.
  (scm_i_set_port_encoding_x): Record that we are now at stream start.
  Do not open the new iconv descriptors immediately; let them be
  initialized lazily.

* libguile/print.c (display_string_using_iconv): Record that we're no
  longer at stream start.  Write a BOM if appropriate.

* doc/ref/api-io.texi (BOM Handling): New node.

* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
  Adapt test to cope with the fact that 'set-port-encoding!' does not
  immediately open the iconv descriptors.
  (bv-read-test): New procedure.
  ("unicode byte-order marks (BOMs)"): New test prefix.
---
 doc/ref/api-io.texi |

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Daniel Llorens  skribis:

> From 0392b6ceab229d8d6546e1ff79e38d3c11881e04 Mon Sep 17 00:00:00 2001
> From: Daniel Llorens 
> Date: Tue, 2 Apr 2013 16:43:37 +0200
> Subject: [PATCH 4/5] Remove double indirection for 1st arg of array-for-each
>
> * libguile/array-map.c: (rafe): factor GVREF out of rank-1 loop for ra0.

Applied, after adding tests for rank-1 ‘array-for-each’ (which was not
covered, according to
.).

Ludo’.

Re: redo-safe-variables and redo-safe-parameters

2013-04-03 Thread Stefan Israelsson Tampe

> Noha writes
> Hi,

> A few points:

> 1. I really, really think that it is a bad idea for the type of a
> variable to change depending on how it is used (i.e. set!
> vs. set~). That means that you should remove points 12 and 14, and
> maybe some  other points.

I agree with you now, I was a bit too creative there :-)

> 2. You shouldn't specify the semantics in terms of code, but by a 
> description. That means removing points 3, 4, 5, and 7 and replacing
> them with text that says how the variables work. You can move the
> same ideas down to the reference implementation, though - that is
> good.

I will in the end go there meanwhile I like to keep some code for 
clarity.

> 3. I strongly suspect that you will find that MIT Scheme's fluid-let
> has the semantics you want. If it doesn't, I would be interested to
> see an example that shows the difference between that and the type
> of variables that you want.

I think that fluid parameters fluid-let are close in semantics. but I
do not want the backtracking feature of fluid -let in some
applications and it carries an overhead compared to just haveing
variables and an init value.

Note:
I have been going back to guile-log to try to implement the ideas in
actual code. I got it working and implemented in efficient C I
hope. The code for e.g. any-interleave got 8x faster (Now About 10x 
slower
then a simple all in stead of 80). My conclusion now is that to get an
efficient implementation one needs to go back to dynamic-wind
guards. And demand that redo-safeness imply that we assume that the
code backtracks over the dynamic-wind at the state storage point. This
is featurefull enough to satisfy most uses of redo/undo that I can
think of.

/Stefan

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Ludovic Courtès

Daniel Llorens  skribis:

> From 03aa7ea2e61b419d38c6ca8de75ee34e5b55d28a Mon Sep 17 00:00:00 2001
> From: Daniel Llorens 
> Date: Wed, 3 Apr 2013 16:19:17 +0200
> Subject: [PATCH 5/5] Remove double indirection in array-fill!
>
> * libguile/array-map.c, libguile/array-map.c: deprecate scm_array_fill_int.
> * libguile/array-map.c: new function rafill, like scm_array_fill_int, but
>   factors GVSET out of loop. Use this in scm_array_fill_x instead of
>   scm_array_fill_int.
> * test-suite/tests/arrays.test: test array-fill! with noncompact array.
> * doc/guile-api.alist: unlist scm_array_fill_int.

Why deprecate scm_array_fill_int?  Seems to me that there’s nothing
wrong about it, no?

Ludo’.

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Ludovic Courtès

Mark H Weaver  skribis:

> l...@gnu.org (Ludovic Courtès) writes:
>> Woow, well thought out.  The semantics seem good.  (It’s interesting to
>> see how BOMs complicate things, but that’s life, I guess.)
>>
>> The patch looks good to me.  The test suite is nice.  It doesn’t seem to
>> cover all the corner cases listed above, but that can be added later on
>> perhaps?
>
> Yes, the tests are still a work-in-progess, but I've added quite a few
> more since you last looked.

Nice.

>> Perhaps the text above could be added to the manual,
>
> In the attached patch, I've added a new node to the "Input and Output"
> section.

Perfect.

>>> +{
>>> +  scm_t_port *pt = SCM_PTAB_ENTRY (port);
>>> +  int result;
>>> +  int i = 0;
>>> +
>>> +  while (i < len && scm_peek_byte_or_eof (port) == bytes[i])
>>> +{
>>> +  pt->read_pos++;
>>> +  i++;
>>> +}
>>> +
>>> +  result = (i == len);
>>> +
>>> +  while (i > 0)
>>> +scm_unget_byte (bytes[--i], port);
>>> +
>>> +  return result;
>>> +}
>>
>> Should it be scm_get_byte_or_eof given that scm_unget_byte is used later?
>
> Yes.  Bytes are only consumed if are equal to bytes[i], so an EOF will
> never be consumed or passed to scm_unget_byte.
>
>> What if pt->read_buf_size == 1?  What if there’s data in saved_read_buf?
>
> All of those details are handled by 'scm_peek_byte_or_eof', which is
> guaranteed to leave 'pt->read_pos' pointing at the byte that's returned
> (if not EOF).  Therefore, it's always safe to increment that pointer by
> one (but no more than one) after calling 'scm_peek_byte_or_eof' if it
> returned non-EOF.
>
> Look at the code for 'scm_peek_byte_or_eof' and this will be clear.
> Also note that you did the same thing in 'scm_utf8_codepoint' :)

Ah yes, indeed.

[...]

> Here's the new patch.  Any more suggestions?

Not from me!  OK to commit as far as I’m concerned.

Thank you!

Ludo’.

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Mark H Weaver

l...@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver  skribis:
>
>> Here's the new patch.  Any more suggestions?
>
> Not from me!  OK to commit as far as I’m concerned.

Great!  I'd still like to hear what Andy thinks.
I've attached a new version with some more tweaks.

Thanks,
  Mark

>From f849f9a3f6babd87088d39369442a7f429762cec Mon Sep 17 00:00:00 2001
From: Mark H Weaver 
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).

* libguile/ports-internal.h (struct scm_port_internal): Add new members
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (SCM_UNICODE_BOM): New macro.
  (scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.

* libguile/ports.c (scm_new_port_table_entry): Initialize
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (get_iconv_codepoint): Pass new 'mode' parameter to
  'scm_i_port_iconv_descriptors'.
  (get_codepoint): After reading a codepoint at stream start, record
  that we're no longer at stream start, and consume a BOM where
  appropriate.
  (scm_seek): Set the stream start flags according to the new position.
  (looking_at_bytes): New static function.
  (scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
  scm_utf32le_bom): New static const arrays.
  (decide_utf16_encoding, decide_utf32_encoding): New static functions.
  (scm_i_port_iconv_descriptors): Add new 'mode' parameter.  If the
  specified encoding is UTF-16 or UTF-32, make that precise by deciding
  what endianness to use, and construct iconv descriptors based on the
  precise encoding.
  (scm_i_set_port_encoding_x): Record that we are now at stream start.
  Do not open the new iconv descriptors immediately; let them be
  initialized lazily.

* libguile/print.c (display_string_using_iconv): Record that we're no
  longer at stream start.  Write a BOM if appropriate.

* doc/ref/api-io.texi (BOM Handling): New node.

* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
  Adapt test to cope with the fact that 'set-port-encoding!' does not
  immediately open the iconv descriptors.
  (bv-read-test): New procedure.
  ("unicode byte-order marks (BOMs)"): New test prefix.
---
 doc/ref/api-io.texi |   65 ++
 libguile/ports-internal.h   |7 +-
 libguile/ports.c|  146 ++
 libguile/print.c|   18 ++-
 test-suite/tests/ports.test |  284 ++-
 5 files changed, 494 insertions(+), 26 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 8c974be..57afa37 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -19,6 +19,7 @@
 * Port Types::  Types of port and how to make them.
 * R6RS I/O Ports::  The R6RS port API.
 * I/O Extensions::  Using and extending ports in C.
+* BOM Handling::Handling of Unicode byte order marks.
 @end menu

@@ -2373,6 +2374,70 @@ Set using

 @end table

+@node BOM Handling
+@subsection Handling of Unicode byte order marks.
+@cindex BOM
+@cindex byte order mark
+
+This section documents the finer points of Guile's handling of Unicode
+byte order marks (BOMs).  A byte order mark (U+FEFF) is typically found
+at the start of a UTF-16 or UTF-32 stream, so that the reader can
+reliably determine the byte order.  Occasionally, a BOM is found at the
+start of a UTF-8 stream, but this is much less common and not generally
+recommended.
+
+Guile attempts to handle BOMs automatically, and in accordance with the
+recommendations of the Unicode Standard, when the port encoding is set
+to @code{UTF-8}, @code{UTF-16}, or @code{UTF-32}.  In brief, Guile
+automatically writes a BOM at the start of a UTF-16 and UTF-32 stream,
+and automatically consumes one from the start of a UTF-8, UTF-16, or
+UTF-32 stream.
+
+As specified in the Unicode Standard, a BOM is only handled specially at
+the start of a stream, and only if the port encoding is set to
+@code{UTF-16} or @code{UTF-32}.  If the port encoding is set to
+@code{UTF-16BE}, @code{UTF-16LE}, @code{UTF-16BE}, or @code{UTF-16LE},
+then BOMs are @emph{not} handled specially, and none of the special
+handling described in this section applies.
+
+@itemize @bullet
+@item
+To ensure that Guile will properly detect the byte order of a
+@code{UTF-16} or @code{UTF-32} stream, you must perform a textual read
+before writing, seeking, or binary I/O.  Guile will not attempt to read
+a BOM until a read is explicitly requested at the start of the stream.
+
+@item
+If @code{set-port-encoding!} is called in the middle of a stream, Guile
+treats this as a new logical ``start of stream'' for purposes of BOM
+handling.  This is intended to multiple logical text streams embedded
+within a larger binary stream.
+
+@item
+Binary I/O operations are not guaranteed to update Guile's notion of
+whether the port is at the ``start of the stream'', nor are they
+g

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Mike Gran

Hi Mark



>>>  Here's the new patch.  Any more suggestions?

There are a couple of lines in your doc patch that aren't quite right.

"@code{UTF-16BE}, @code{UTF-16LE}, @code{UTF-16BE}, or @code{UTF-16LE}"

I assume that two of these should be UTF-32.

Also

"This is intended to multiple logical text streams embedded
within a larger binary stream.""

Should probably be be

"This is intended to support multiple ..."

-Mike

Re: CPS Update

2013-04-03 Thread Noah Lavine

On Sat, Mar 9, 2013 at 3:31 AM, Andy Wingo  wrote:

> On Fri 08 Mar 2013 23:57, Noah Lavine  writes:
>
> > Somewhat shockingly, I think that's almost every language feature.
>
> Wow, nice work :-))
>
> Sounds like great stuff.  I hope to join you in this work once 2.0.8 is
> out.  It's really great that you're working on this!
>

Thanks a lot. I am not an expert by any means, and it would be great to
have someone like you who really knows what they're doing hacking at it.

Noah

Re: array-copy! is slow & array-map.c

2013-04-03 Thread Daniel Llorens


Attached 1st a patch with some array-copy! tests, then the array-fill! patch 
split in two parts.

Thanks!

Daniel



0001-Tests-for-array-copy.patch
Description: Binary data


0002-Remove-double-indirection-in-array-fill.patch
Description: Binary data


0003-Deprecate-scm_array_fill_int.patch
Description: Binary data

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Mark H Weaver

Thanks for the review, Mike.  I've attached a new patch with those
problems (and a few others) fixed.

 Mark

>From a373927201028915f7b8cd5a1c72c5819cb4797c Mon Sep 17 00:00:00 2001
From: Mark H Weaver 
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).

* libguile/ports-internal.h (struct scm_port_internal): Add new members
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (SCM_UNICODE_BOM): New macro.
  (scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.

* libguile/ports.c (scm_new_port_table_entry): Initialize
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (get_iconv_codepoint): Pass new 'mode' parameter to
  'scm_i_port_iconv_descriptors'.
  (get_codepoint): After reading a codepoint at stream start, record
  that we're no longer at stream start, and consume a BOM where
  appropriate.
  (scm_seek): Set the stream start flags according to the new position.
  (looking_at_bytes): New static function.
  (scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
  scm_utf32le_bom): New static const arrays.
  (decide_utf16_encoding, decide_utf32_encoding): New static functions.
  (scm_i_port_iconv_descriptors): Add new 'mode' parameter.  If the
  specified encoding is UTF-16 or UTF-32, make that precise by deciding
  what endianness to use, and construct iconv descriptors based on the
  precise encoding.
  (scm_i_set_port_encoding_x): Record that we are now at stream start.
  Do not open the new iconv descriptors immediately; let them be
  initialized lazily.

* libguile/print.c (display_string_using_iconv): Record that we're no
  longer at stream start.  Write a BOM if appropriate.

* doc/ref/api-io.texi (BOM Handling): New node.

* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
  Adapt test to cope with the fact that 'set-port-encoding!' does not
  immediately open the iconv descriptors.
  (bv-read-test): New procedure.
  ("unicode byte-order marks (BOMs)"): New test prefix.
---
 doc/ref/api-io.texi |   64 ++
 libguile/ports-internal.h   |7 +-
 libguile/ports.c|  146 ++
 libguile/print.c|   18 ++-
 test-suite/tests/ports.test |  284 ++-
 5 files changed, 493 insertions(+), 26 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 8c974be..abf9cbd 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -19,6 +19,7 @@
 * Port Types::  Types of port and how to make them.
 * R6RS I/O Ports::  The R6RS port API.
 * I/O Extensions::  Using and extending ports in C.
+* BOM Handling::Handling of Unicode byte order marks.
 @end menu

@@ -2373,6 +2374,69 @@ Set using

 @end table

+@node BOM Handling
+@subsection Handling of Unicode byte order marks.
+@cindex BOM
+@cindex byte order mark
+
+This section documents the finer points of Guile's handling of Unicode
+byte order marks (BOMs).  A byte order mark (U+FEFF) is typically found
+at the start of a UTF-16 or UTF-32 stream, so that the reader can
+reliably determine the byte order.  Occasionally, a BOM is found at the
+start of a UTF-8 stream, but this is much less common and not generally
+recommended.
+
+Guile attempts to handle BOMs automatically, and in accordance with the
+recommendations of the Unicode Standard, when the port encoding is set
+to @code{UTF-8}, @code{UTF-16}, or @code{UTF-32}.  In brief, Guile
+automatically writes a BOM at the start of a UTF-16 and UTF-32 stream,
+and automatically consumes one from the start of a UTF-8, UTF-16, or
+UTF-32 stream.
+
+As specified in the Unicode Standard, a BOM is only handled specially at
+the start of a stream, and only if the port encoding is set to
+@code{UTF-8}, @code{UTF-16} or @code{UTF-32}.  If the port encoding is
+set to @code{UTF-16BE}, @code{UTF-16LE}, @code{UTF-32BE}, or
+@code{UTF-32LE}, then BOMs are @emph{not} handled specially, and none of
+the special handling described in this section applies.
+
+@itemize @bullet
+@item
+To ensure that Guile will properly detect the byte order of a
+@code{UTF-16} or @code{UTF-32} stream, you must perform a textual read
+before writing, seeking, or binary I/O.  Guile will not attempt to read
+a BOM until a read is explicitly requested at the start of the stream.
+
+@item
+If @code{set-port-encoding!} is called in the middle of a stream, Guile
+treats this as a new logical ``start of stream'' for purposes of BOM
+handling.  This is intended to support multiple logical text streams
+embedded within a larger binary stream.
+
+@item
+Binary I/O operations are not guaranteed to update Guile's notion of
+whether the port is at the ``start of the stream'', nor are they
+guaranteed to produce or consume BOMs.
+
+@item
+For ports that support seeking (e.g. normal files), the input and output
+streams are considered linked: if the us

[PATCH] Add backlog option to http-open

2013-04-03 Thread Nala Ginrut

Here's a patch to add backlog option to http-open, users may use it
like:

---cut
(run-server (lambda (r b) ...) 
'http
'(#:port 1234 #:backlog 1024))
---end

I don't think it's necessary to add the docs since it's explicit.
It may help for some guys like me. ;-)

Happy hacking!

>From b75008bd60967be9935ef6e7bb146832cf852ab3 Mon Sep 17 00:00:00 2001
From: Nala Ginrut 
Date: Thu, 4 Apr 2013 12:33:09 +0800
Subject: [PATCH] Add backlog option to http-open

* web/server/http.scm: Add #:backlog to http-open, users may specify
   backlog to the inner http server.
---
 module/web/server/http.scm |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/module/web/server/http.scm b/module/web/server/http.scm
index cda44f4..c814286 100644
--- a/module/web/server/http.scm
+++ b/module/web/server/http.scm
@@ -62,8 +62,9 @@
   (inet-pton family host)
   INADDR_LOOPBACK))
 (port 8080)
+(backlog 128)
 (socket (make-default-socket family addr port)))
-  (listen socket 128)
+  (listen socket backlog)
   (sigaction SIGPIPE SIG_IGN)
   (let ((poll-set (make-empty-poll-set)))
 (poll-set-add! poll-set socket *events*)
-- 
1.7.10.4

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

2013-04-03 Thread Mark H Weaver

Here's the latest revision of the patch.  The only thing that has
changed is the documentation.

 Mark

>From a3f2c379f11782f0440d9beb2b40601146ee14ea Mon Sep 17 00:00:00 2001
From: Mark H Weaver 
Date: Wed, 3 Apr 2013 04:22:04 -0400
Subject: [PATCH] Improve handling of Unicode byte-order marks (BOMs).

* libguile/ports-internal.h (struct scm_port_internal): Add new members
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (SCM_UNICODE_BOM): New macro.
  (scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.

* libguile/ports.c (scm_new_port_table_entry): Initialize
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (get_iconv_codepoint): Pass new 'mode' parameter to
  'scm_i_port_iconv_descriptors'.
  (get_codepoint): After reading a codepoint at stream start, record
  that we're no longer at stream start, and consume a BOM where
  appropriate.
  (scm_seek): Set the stream start flags according to the new position.
  (looking_at_bytes): New static function.
  (scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
  scm_utf32le_bom): New static const arrays.
  (decide_utf16_encoding, decide_utf32_encoding): New static functions.
  (scm_i_port_iconv_descriptors): Add new 'mode' parameter.  If the
  specified encoding is UTF-16 or UTF-32, make that precise by deciding
  what endianness to use, and construct iconv descriptors based on the
  precise encoding.
  (scm_i_set_port_encoding_x): Record that we are now at stream start.
  Do not open the new iconv descriptors immediately; let them be
  initialized lazily.

* libguile/print.c (display_string_using_iconv): Record that we're no
  longer at stream start.  Write a BOM if appropriate.

* doc/ref/api-io.texi (BOM Handling): New node.

* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
  Adapt test to cope with the fact that 'set-port-encoding!' does not
  immediately open the iconv descriptors.
  (bv-read-test): New procedure.
  ("unicode byte-order marks (BOMs)"): New test prefix.
---
 doc/ref/api-io.texi |   81 +++-
 libguile/ports-internal.h   |7 +-
 libguile/ports.c|  146 ++
 libguile/print.c|   18 ++-
 test-suite/tests/ports.test |  284 ++-
 5 files changed, 509 insertions(+), 27 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 8c974be..3f75a63 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -1,7 +1,7 @@
 @c -*-texinfo-*-
 @c This is part of the GNU Guile Reference Manual.
 @c Copyright (C)  1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009,
-@c   2010, 2011  Free Software Foundation, Inc.
+@c   2010, 2011, 2013  Free Software Foundation, Inc.
 @c See the file guile.texi for copying conditions.

 @node Input and Output
@@ -19,6 +19,7 @@
 * Port Types::  Types of port and how to make them.
 * R6RS I/O Ports::  The R6RS port API.
 * I/O Extensions::  Using and extending ports in C.
+* BOM Handling::Handling of Unicode byte order marks.
 @end menu

@@ -2373,6 +2374,84 @@ Set using

 @end table

+@node BOM Handling
+@subsection Handling of Unicode byte order marks.
+@cindex BOM
+@cindex byte order mark
+
+This section documents the finer points of Guile's handling of Unicode
+byte order marks (BOMs).  A byte order mark (U+FEFF) is typically found
+at the start of a UTF-16 or UTF-32 stream, to allow readers to reliably
+determine the byte order.  Occasionally, a BOM is found at the start of
+a UTF-8 stream, but this is much less common and not generally
+recommended.
+
+Guile attempts to handle BOMs automatically, and in accordance with the
+recommendations of the Unicode Standard, when the port encoding is set
+to @code{UTF-8}, @code{UTF-16}, or @code{UTF-32}.  In brief, Guile
+automatically writes a BOM at the start of a UTF-16 or UTF-32 stream,
+and automatically consumes one from the start of a UTF-8, UTF-16, or
+UTF-32 stream.
+
+As specified in the Unicode Standard, a BOM is only handled specially at
+the start of a stream, and only if the port encoding is set to
+@code{UTF-8}, @code{UTF-16} or @code{UTF-32}.  If the port encoding is
+set to @code{UTF-16BE}, @code{UTF-16LE}, @code{UTF-32BE}, or
+@code{UTF-32LE}, then BOMs are @emph{not} handled specially, and none of
+the special handling described in this section applies.
+
+@itemize @bullet
+@item
+To ensure that Guile will properly detect the byte order of a UTF-16 or
+UTF-32 stream, you must perform a textual read before any writes, seeks,
+or binary I/O.  Guile will not attempt to read a BOM unless a read is
+explicitly requested at the start of the stream.
+
+@item
+If a textual write is performed before the first read, then an arbitrary
+byte order will be chosen.  Currently, big endian is the default on all
+platforms, but that may change in the future.  If you wish to explicitly
+control the byte ord

Re: [PATCH] Move let/ec to top-level

2013-04-03 Thread Nala Ginrut

On Wed, 2013-03-27 at 22:14 +0100, Ludovic Courtès wrote:
> Sounds good to me, but can you also (1) add doc, probably under “Prompt
> Primitives”, with cross-refs from the “Exceptions” section, and (2)
> write a ChangeLog-style commit log?
> 
> Thanks,
> Ludo’.
> 
> 

Add call/ec and let/ec to (ice-9 control) with docs in the manual.

Thanks!

>From 0c899e96d9667a88ceb17cfbcdedf3e18aeef21c Mon Sep 17 00:00:00 2001
From: Nala Ginrut 
Date: Thu, 4 Apr 2013 14:14:25 +0800
Subject: [PATCH] Add escape continuation with docs

* ref/api-control.texi: Add docs
* ice-9/control.scm: Add let/ec and call/ec
* ice-9/futures.scm: Remove the old let/ec implementation.
---
 doc/ref/api-control.texi |   23 +++
 module/ice-9/control.scm |   27 +--
 module/ice-9/futures.scm |   11 +--
 3 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/doc/ref/api-control.texi b/doc/ref/api-control.texi
index 320812d..9ddeeb0 100644
--- a/doc/ref/api-control.texi
+++ b/doc/ref/api-control.texi
@@ -577,6 +577,29 @@ Before moving on, we should mention that if the handler of a prompt is a
 that prompt will not cause a continuation to be reified. This can be an
 important efficiency consideration to keep in mind.
 
+@deffn {Scheme Procedure} call/ec proc
+'ec' stands for escape-continuation, or so-called 'one-shot' continuation.
+@var{call/ec} is equivalent to call-with-escape-continuation. 
+A continuation obtained from call/ec is actually a kind of prompt. @var{call/ec}
+is often an easy replacement for @var{call/cc} to improve performance.
+More details read @uref{http://www.cs.indiana.edu/~bruggema/one-shots-abstract.html, 
+Representing control in the presence of one-shot continuations}.
+@end deffn
+
+@deffn {Scheme Syntax} let/ec k body
+Equivalent to (call/ec (lambda (k) body ...)).
+@end deffn
+
+@example
+(use-module (ice-9 control))
+
+(call/ec (lambda (return)
+   (return 123)))
+
+(let/ec return (return 123))
+@end example
+
+
 @node Shift and Reset
 @subsubsection Shift, Reset, and All That
 
diff --git a/module/ice-9/control.scm b/module/ice-9/control.scm
index 5f25738..4a114fd 100644
--- a/module/ice-9/control.scm
+++ b/module/ice-9/control.scm
@@ -1,6 +1,6 @@
 ;;; Beyond call/cc
 
-;; Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+;; Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc.
 
  This library is free software; you can redistribute it and/or
  modify it under the terms of the GNU Lesser General Public
@@ -21,7 +21,7 @@
 (define-module (ice-9 control)
   #:re-export (call-with-prompt abort-to-prompt
default-prompt-tag make-prompt-tag)
-  #:export (% abort shift reset shift* reset*))
+  #:export (% abort shift reset shift* reset* call/ec let/ec))
 
 (define (abort . args)
   (apply abort-to-prompt (default-prompt-tag) args))
@@ -76,3 +76,26 @@
 
 (define (shift* fc)
   (shift c (fc c)))
+
+(define %call/ec-prompt
+  (make-prompt-tag))
+
+(define-syntax-rule (call/ec proc)
+  ;; aka. `call-with-escape-continuation'
+  (let ((tag (make-prompt-tag)))
+(call-with-prompt 
+ tag
+ (lambda ()
+   (proc (lambda args (apply abort-to-prompt args
+ (lambda (_ . args)
+   (apply values args)
+
+(define-syntax-rule (let/ec k e e* ...)
+  ;; aka. `let-escape-continuation'
+  (let ((tag (make-prompt-tag)))
+(call-with-prompt
+ tag
+ (lambda ()
+   (let ((k (lambda args (apply abort-to-prompt tag args
+ e e* ...))
+ (lambda (_ . results) (apply values results)
diff --git a/module/ice-9/futures.scm b/module/ice-9/futures.scm
index 35a36ca..90bbe53 100644
--- a/module/ice-9/futures.scm
+++ b/module/ice-9/futures.scm
@@ -23,6 +23,7 @@
   #:use-module (srfi srfi-11)
   #:use-module (ice-9 q)
   #:use-module (ice-9 match)
+  #:use-module (ice-9 control)
   #:export (future make-future future? touch))
 
 ;;; Author: Ludovic Courtès 
@@ -105,16 +106,6 @@ touched."
   (lambda () (begin e0 e1 ...))
   (lambda () (unlock-mutex x)
 
-(define-syntax-rule (let/ec k e e* ...)   ; TODO: move to core
-  (let ((tag (make-prompt-tag)))
-(call-with-prompt
- tag
- (lambda ()
-   (let ((k (lambda args (apply abort-to-prompt tag args
- e e* ...))
- (lambda (_ res) res
-
-
 (define %future-prompt
   ;; The prompt futures abort to when they want to wait for another
   ;; future.
-- 
1.7.10.4

Re: [PATCH] Move slow path out of 'scm_get_byte_or_eof' et al

[PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: array-copy! is slow & array-map.c

Re: array-copy! is slow & array-map.c

Re: array-copy! is slow & array-map.c

Re: array-copy! is slow & array-map.c

Re: array-copy! is slow & array-map.c

Re: array-copy! is slow & array-map.c

Re: array-copy! is slow & array-map.c

Re: array-copy! is slow & array-map.c

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: array-copy! is slow & array-map.c

Re: redo-safe-variables and redo-safe-parameters

Re: array-copy! is slow & array-map.c

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: CPS Update

Re: array-copy! is slow & array-map.c

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

[PATCH] Add backlog option to http-open

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Re: [PATCH] Move let/ec to top-level

25 matches

Site Navigation

Mail list logo

Footer information