Re: make check fails if no en_US.iso88591 locale
Hello! I built today’s ‘master’ on a ppc64 box and there are many regexp-related errors and a surprisingly high number of unresolved regexp-related tests: http://autobuild.josefsson.org/guile/log-200909100539539848000.txt This machine only has the following locales: C en_US.utf8 POSIX Thanks, Ludo’.
Re: make check fails if no en_US.iso88591 locale
On Thu, 2009-09-10 at 12:27 +0200, Ludovic Courtès wrote: > Hello! > > I built today’s ‘master’ on a ppc64 box and there are many > regexp-related errors and a surprisingly high number of unresolved > regexp-related tests: > > http://autobuild.josefsson.org/guile/log-200909100539539848000.txt > > This machine only has the following locales: > > C > en_US.utf8 > POSIX > I'm not surprised to see the unresolved, since I'd wrapped a lot of those tests to throw unresolved if a Latin-1 locale wasn't found. The errors are a surprise: they indicate that my strategy for wrapping in a Latin-1 locale isn't correct. The reason for declaring a Latin-1 locale was to allow scm_to/from_locale_string to convert a scheme string with values from 0 to 255 to an 8-bit binary C string. The regexp.test runs on arbitrary binary data which wasn't a problem in guile-1.8 since scm_to/from_locale_string did no real locale conversion. I could fix the test by testing only characters 0 to 127 in a C locale if a Latin-1 locale can't be found. I can also fix the test by using the 'setbinary' function to force the encodings on stdin and stdout to a default value that will pass through binary data, instead of calling 'setlocale'. The procedure 'setbinary' was always a hack, and I kind of want to get rid of it, but, this is why it was created. I looked in the POSIX spec on Regex for specific advice using 128-255 in regex in the C locale. I didn't see anything offhand. The spec does spend a lot of time talking about the interaction between the locale and regular expressions. I get the impression from the spec that using regex on 128-255 in the C locale is an unexpected use of regular expressions. Thanks, Mike
Re: make check fails if no en_US.iso88591 locale
Mike Gran writes: > I could fix the test by testing only characters 0 to 127 in a C locale > if a Latin-1 locale can't be found. Yes, that'd be nice. > I can also fix the test by using the 'setbinary' function --8<---cut here---start->8--- scheme@(guile-user)> (help setbinary) `setbinary' is a primitive procedure in the (guile) module. -- Scheme Procedure: setbinary Sets the encoding for the current input, output, and error ports to ISO-8859-1. That character encoding allows ports to operate on binary data. It also sets the default encoding for newly created ports to ISO-8859-1. The previous default encoding for new ports is returned --8<---cut here---end--->8--- It seems to do a lot of things, which aren't clear from the name. ;-) What can be done about it? At least it should be renamed, to `set-port-binary-mode!' or similar. Then it'd be nice if that functionality could be split in several functions, some operating on a per-port basis. After all, one can already do: (for-each (lambda (p) (set-port-encoding! p "ISO-8859-1")) (list (current-input-port) (current-output-port) (current-error-port))) So we just lack: ;; encoding for newly created ports (set-default-port-encoding! "ISO-8859-1") With that `setbinary' can be implemented in Scheme. > to force the encodings on stdin and stdout to a default value that > will pass through binary data, instead of calling 'setlocale'. Hmm, I think I'd still prefer `setlocale'. regexec(3) doesn't say anything about the string encoding. Do libc implementations actually expect plain ASCII or Latin-1? Or do they adapt to the current locale's encoding? > I looked in the POSIX spec on Regex for specific advice using 128-255 in > regex in the C locale. I didn't see anything offhand. The spec does > spend a lot of time talking about the interaction between the locale and > regular expressions. I get the impression from the spec that using > regex on 128-255 in the C locale is an unexpected use of regular > expressions. http://www.opengroup.org/onlinepubs/9699919799/functions/regexec.html reads: If, when regexec() is called, the locale is different from when the regular expression was compiled, the result is undefined. It makes me think that, if a process runs with a UTF-8 locale and passes raw UTF-8 bytes to regcomp(3) and regexec(3), it may work. Hmm, the program below, with UTF-8-encoded source, works both with a Latin-1 and a UTF-8 locale: #include #include #include int main (int argc, char *argv[]) { regex_t rx; regmatch_t match; setlocale (LC_ALL, "fr_FR.utf8"); regcomp (&rx, "ça", REG_EXTENDED); return regexec (&rx, "ça va ?", 1, &match, 0) == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } Do you think it would work to just leave `regexp.test' as it is in 1.8? Thanks, Ludo'.
λ the ultimate showcase
Hey, Now that we have Unicode, let’s not put it to good use! (define-syntax λ (syntax-rules () ((_ formals body ...) (lambda formals body ... Should ‘boot-9.scm’ provide this macro? Ludo’.
Re: λ the ultimate showcase
l...@gnu.org (Ludovic Courtès) writes: > Now that we have Unicode, let’s not put it to good use! Someone must have tampered with my message. Of course, it should read “let’s put it to good use”. Ludo’.
Re: λ the ultimate showcase
l...@gnu.org (Ludovic Courtès) writes: > Hey, > > Now that we have Unicode, let’s not put it to good use! > > (define-syntax λ > (syntax-rules () > ((_ formals body ...) >(lambda formals body ... Can it be overridden? Just in case someone writes an algorithm where they'd really like to have λ as a variable? (In other words, I guess, can define-syntax things in general be overridden?) > Should ‘boot-9.scm’ provide this macro? If the answer to the above is Yes, definitely. Neil
Re: make check fails if no en_US.iso88591 locale
Mike Gran writes: > I'm not much of a regex guy, but, here's a couple of examples. First > one that sort of works as expected. > > guile> (string-match "sé" "José") > ==> #("José" (2 . 5)) > > Regex properly matches the word, but, the match struct (2 . 5) is > referring to the bytes of the string, not the characters of the string. That's with a UTF-8 locale, isn't it? With latin-1 I suppose the numbers would be (2 . 4), right? > Here's one that doesn't work as expected. > > guile> (string-match "[:lower:]" "Hi, mom") > ==> #("Hi, mom" (5 . 6)) > guile> (string-match "[:lower:]" "Hí, móm") > ==> #f > > Once you add accents on the vowels, nothing matches. > > Thanks, Thank you! Do you think it would be good to add these examples to the manual? (I'm happy to do that if so.) Neil
Re: BDW-GC branch updated
l...@gnu.org (Ludovic Courtès) writes: >>> So now is a good time to test it and report back! It requires libgc 7.1 >>> or later, which isn't packaged in Debian, although it was released in >>> May 2008. >>> >> It's in experimental since recently; I assume its maintainer will upload >> to unstable soonish. > > Good. I just installed libgc1c2 and libgc-dev (both 1:7.1-3) on my Debian stable/testing machine. Apparently no problem there. But there's still no pkgconfig for libgc, and so PKG_CHECK_MODULES([BDW_GC], [bdw-gc]) fails: checking for BDW_GC... configure: error: Package requirements (bdw-gc) were not met: No package 'bdw-gc' found Am I missing some easy solution? (I haven't tried the approach of setting BDW_GC_CFLAGS and BDW_GC_LIBS yet.) Neil
Re: make check fails if no en_US.iso88591 locale
> From: Neil Jerram > Mike Gran writes: > > Here's one that doesn't work as expected. > > > > guile> (string-match "[:lower:]" "Hi, mom") > > ==> #("Hi, mom" (5 . 6)) > > guile> (string-match "[:lower:]" "Hí, móm") > > ==> #f > > > > Once you add accents on the vowels, nothing matches. Doh! This one doesn't work because it is nonsense. It should have been [[:lower:]], not [:lower:] Thanks, Mike
Re: BDW-GC branch updated
Hi Neil, Neil Jerram writes: > I just installed libgc1c2 and libgc-dev (both 1:7.1-3) on my Debian > stable/testing machine. Apparently no problem there. > > But there's still no pkgconfig for libgc, and so > > PKG_CHECK_MODULES([BDW_GC], [bdw-gc]) > > fails: I checked the upstream tarballs and both 7.0 and 7.1 come with ‘bdw-gc.pc.in’. Thus I suspect this is a packaging issue. Can you report it on the Debian side? Thanks, Ludo’.
Re: λ the ultimate showcase
Neil Jerram writes: > l...@gnu.org (Ludovic Courtès) writes: > >> Hey, >> >> Now that we have Unicode, let’s not put it to good use! >> >> (define-syntax λ >> (syntax-rules () >> ((_ formals body ...) >>(lambda formals body ... > > Can it be overridden? Yes. In the end it boils down to ‘module-define!’. > Just in case someone writes an algorithm where they'd really like to > have λ as a variable? One can always use ‘λ’ or ‘lambda’ as a local variable name: (let ((λ 2)) (+ λ 3)) > If the answer to the above is Yes, definitely. Cool, let’s do it! :-) (Then we’ll want “’” for ‘quote’, “‘” for ‘quasiquote’, etc. etc.) Thanks, Ludo’.
Re: make check fails if no en_US.iso88591 locale
On Thu, 2009-09-10 at 17:33 +0200, Ludovic Courtès wrote: > Do you think it would work to just leave `regexp.test' as it is in 1.8? It would probably work, but, it offends my sense of aesthetics that the names of the tests would be displayed in the wrong locale for the terminal. I'm uploading yet another attempt at doing the right thing in regexp.test. Third time's a charm. > > Thanks, > Ludo'.