Re: make check fails if no en_US.iso88591 locale

2009-09-10 Thread Ludovic Courtès
Hello!

I built today’s ‘master’ on a ppc64 box and there are many
regexp-related errors and a surprisingly high number of unresolved
regexp-related tests:

  http://autobuild.josefsson.org/guile/log-200909100539539848000.txt

This machine only has the following locales:

  C
  en_US.utf8
  POSIX

Thanks,
Ludo’.





Re: make check fails if no en_US.iso88591 locale

2009-09-10 Thread Mike Gran
On Thu, 2009-09-10 at 12:27 +0200, Ludovic Courtès wrote:
> Hello!
> 
> I built today’s ‘master’ on a ppc64 box and there are many
> regexp-related errors and a surprisingly high number of unresolved
> regexp-related tests:
> 
>   http://autobuild.josefsson.org/guile/log-200909100539539848000.txt
> 
> This machine only has the following locales:
> 
>   C
>   en_US.utf8
>   POSIX
> 

I'm not surprised to see the unresolved, since I'd wrapped a lot of
those tests to throw unresolved if a Latin-1 locale wasn't found.  The
errors are a surprise: they indicate that my strategy for wrapping in a
Latin-1 locale isn't correct.

The reason for declaring a Latin-1 locale was to allow
scm_to/from_locale_string to convert a scheme string with values from 0
to 255 to an 8-bit binary C string.  The regexp.test runs on arbitrary
binary data which wasn't a problem in guile-1.8 since
scm_to/from_locale_string did no real locale conversion.

I could fix the test by testing only characters 0 to 127 in a C locale
if a Latin-1 locale can't be found.  I can also fix the test by using
the 'setbinary' function to force the encodings on stdin and stdout to a
default value that will pass through binary data, instead of calling
'setlocale'.  The procedure 'setbinary' was always a hack, and I kind of
want to get rid of it, but, this is why it was created.

I looked in the POSIX spec on Regex for specific advice using 128-255 in
regex in the C locale.  I didn't see anything offhand.  The spec does
spend a lot of time talking about the interaction between the locale and
regular expressions.  I get the impression from the spec that using
regex on 128-255 in the C locale is an unexpected use of regular
expressions.

Thanks,
Mike





Re: make check fails if no en_US.iso88591 locale

2009-09-10 Thread Ludovic Courtès
Mike Gran  writes:

> I could fix the test by testing only characters 0 to 127 in a C locale
> if a Latin-1 locale can't be found.

Yes, that'd be nice.

> I can also fix the test by using the 'setbinary' function

--8<---cut here---start->8---
scheme@(guile-user)> (help setbinary)
`setbinary' is a primitive procedure in the (guile) module.

 -- Scheme Procedure: setbinary
 Sets the encoding for the current input, output, and error ports
 to ISO-8859-1.  That character encoding allows ports to operate on
 binary data.

 It also sets the default encoding for newly created ports to
 ISO-8859-1.

 The previous default encoding for new ports is returned
--8<---cut here---end--->8---

It seems to do a lot of things, which aren't clear from the name.  ;-)

What can be done about it?

At least it should be renamed, to `set-port-binary-mode!' or similar.

Then it'd be nice if that functionality could be split in several
functions, some operating on a per-port basis.  After all, one can
already do:

  (for-each (lambda (p)
  (set-port-encoding! p "ISO-8859-1"))
(list (current-input-port) (current-output-port)
  (current-error-port)))

So we just lack:

  ;; encoding for newly created ports
  (set-default-port-encoding! "ISO-8859-1")

With that `setbinary' can be implemented in Scheme.

> to force the encodings on stdin and stdout to a default value that
> will pass through binary data, instead of calling 'setlocale'.

Hmm, I think I'd still prefer `setlocale'.

regexec(3) doesn't say anything about the string encoding.  Do libc
implementations actually expect plain ASCII or Latin-1?  Or do they
adapt to the current locale's encoding?

> I looked in the POSIX spec on Regex for specific advice using 128-255 in
> regex in the C locale.  I didn't see anything offhand.  The spec does
> spend a lot of time talking about the interaction between the locale and
> regular expressions.  I get the impression from the spec that using
> regex on 128-255 in the C locale is an unexpected use of regular
> expressions.

http://www.opengroup.org/onlinepubs/9699919799/functions/regexec.html
reads:

  If, when regexec() is called, the locale is different from when the
  regular expression was compiled, the result is undefined.

It makes me think that, if a process runs with a UTF-8 locale and passes
raw UTF-8 bytes to regcomp(3) and regexec(3), it may work.

Hmm, the program below, with UTF-8-encoded source, works both with a
Latin-1 and a UTF-8 locale:

#include 
#include 
#include 

int
main (int argc, char *argv[])
{
  regex_t rx;
  regmatch_t match;

  setlocale (LC_ALL, "fr_FR.utf8");

  regcomp (&rx, "ça", REG_EXTENDED);
  return regexec (&rx, "ça va ?", 1, &match, 0) == 0
? EXIT_SUCCESS : EXIT_FAILURE;
}

Do you think it would work to just leave `regexp.test' as it is in 1.8?

Thanks,
Ludo'.


λ the ultimate showcase

2009-09-10 Thread Ludovic Courtès
Hey,

Now that we have Unicode, let’s not put it to good use!

  (define-syntax λ
(syntax-rules ()
  ((_ formals body ...)
   (lambda formals body ...

Should ‘boot-9.scm’ provide this macro?

Ludo’.





Re: λ the ultimate showcase

2009-09-10 Thread Ludovic Courtès
l...@gnu.org (Ludovic Courtès) writes:

> Now that we have Unicode, let’s not put it to good use!

Someone must have tampered with my message.  Of course, it should read
“let’s put it to good use”.

Ludo’.





Re: λ the ultimate showcase

2009-09-10 Thread Neil Jerram
l...@gnu.org (Ludovic Courtès) writes:

> Hey,
>
> Now that we have Unicode, let’s not put it to good use!
>
>   (define-syntax λ
> (syntax-rules ()
>   ((_ formals body ...)
>(lambda formals body ...

Can it be overridden?  Just in case someone writes an algorithm where
they'd really like to have λ as a variable?

(In other words, I guess, can define-syntax things in general be
overridden?)

> Should ‘boot-9.scm’ provide this macro?

If the answer to the above is Yes, definitely.

 Neil




Re: make check fails if no en_US.iso88591 locale

2009-09-10 Thread Neil Jerram
Mike Gran  writes:

> I'm not much of a regex guy, but, here's a couple of examples.  First
> one that sort of works as expected.
>
> guile> (string-match "sé" "José") 
> ==> #("José" (2 . 5))
>
> Regex properly matches the word, but, the match struct (2 . 5) is
> referring to the bytes of the string, not the characters of the string.

That's with a UTF-8 locale, isn't it?  With latin-1 I suppose the
numbers would be (2 . 4), right?

> Here's one that doesn't work as expected.
>
> guile> (string-match "[:lower:]" "Hi, mom")
> ==> #("Hi, mom" (5 . 6))
> guile> (string-match "[:lower:]" "Hí, móm")
> ==> #f
>
> Once you add accents on the vowels, nothing matches.
>
> Thanks,

Thank you!  Do you think it would be good to add these examples to the
manual?  (I'm happy to do that if so.)

   Neil







Re: BDW-GC branch updated

2009-09-10 Thread Neil Jerram
l...@gnu.org (Ludovic Courtès) writes:

>>> So now is a good time to test it and report back!  It requires libgc 7.1
>>> or later, which isn't packaged in Debian, although it was released in
>>> May 2008.
>>>
>> It's in experimental since recently; I assume its maintainer will upload
>> to unstable soonish.
>
> Good.

I just installed libgc1c2 and libgc-dev (both 1:7.1-3) on my Debian
stable/testing machine.  Apparently no problem there.

But there's still no pkgconfig for libgc, and so 

PKG_CHECK_MODULES([BDW_GC], [bdw-gc])

fails:

checking for BDW_GC... configure: error: Package requirements (bdw-gc) were not 
met:

No package 'bdw-gc' found

Am I missing some easy solution?

(I haven't tried the approach of setting BDW_GC_CFLAGS and BDW_GC_LIBS
yet.)

Neil




Re: make check fails if no en_US.iso88591 locale

2009-09-10 Thread Mike Gran
> From: Neil Jerram 
> Mike Gran writes:

> > Here's one that doesn't work as expected.
> >
> > guile> (string-match "[:lower:]" "Hi, mom")
> > ==> #("Hi, mom" (5 . 6))
> > guile> (string-match "[:lower:]" "Hí, móm")
> > ==> #f
> >
> > Once you add accents on the vowels, nothing matches.

Doh!  This one doesn't work because it is nonsense.

It should have been [[:lower:]], not [:lower:]

Thanks,

Mike




Re: BDW-GC branch updated

2009-09-10 Thread Ludovic Courtès
Hi Neil,

Neil Jerram  writes:

> I just installed libgc1c2 and libgc-dev (both 1:7.1-3) on my Debian
> stable/testing machine.  Apparently no problem there.
>
> But there's still no pkgconfig for libgc, and so 
>
> PKG_CHECK_MODULES([BDW_GC], [bdw-gc])
>
> fails:

I checked the upstream tarballs and both 7.0 and 7.1 come with
‘bdw-gc.pc.in’.  Thus I suspect this is a packaging issue.  Can you
report it on the Debian side?

Thanks,
Ludo’.





Re: λ the ultimate showcase

2009-09-10 Thread Ludovic Courtès
Neil Jerram  writes:

> l...@gnu.org (Ludovic Courtès) writes:
>
>> Hey,
>>
>> Now that we have Unicode, let’s not put it to good use!
>>
>>   (define-syntax λ
>> (syntax-rules ()
>>   ((_ formals body ...)
>>(lambda formals body ...
>
> Can it be overridden?

Yes.  In the end it boils down to ‘module-define!’.

> Just in case someone writes an algorithm where they'd really like to
> have λ as a variable?

One can always use ‘λ’ or ‘lambda’ as a local variable name:

  (let ((λ 2))
(+ λ 3))

> If the answer to the above is Yes, definitely.

Cool, let’s do it!  :-)

(Then we’ll want “’” for ‘quote’, “‘” for ‘quasiquote’, etc. etc.)

Thanks,
Ludo’.





Re: make check fails if no en_US.iso88591 locale

2009-09-10 Thread Mike Gran
On Thu, 2009-09-10 at 17:33 +0200, Ludovic Courtès wrote:

> Do you think it would work to just leave `regexp.test' as it is in 1.8?

It would probably work, but, it offends my sense of aesthetics that the
names of the tests would be displayed in the wrong locale for the
terminal.  I'm uploading yet another attempt at doing the right thing in
regexp.test.  Third time's a charm.

> 
> Thanks,
> Ludo'.