On Thu, 2009-09-10 at 12:27 +0200, Ludovic Courtès wrote: > Hello! > > I built today’s ‘master’ on a ppc64 box and there are many > regexp-related errors and a surprisingly high number of unresolved > regexp-related tests: > > http://autobuild.josefsson.org/guile/log-200909100539539848000.txt > > This machine only has the following locales: > > C > en_US.utf8 > POSIX >
I'm not surprised to see the unresolved, since I'd wrapped a lot of those tests to throw unresolved if a Latin-1 locale wasn't found. The errors are a surprise: they indicate that my strategy for wrapping in a Latin-1 locale isn't correct. The reason for declaring a Latin-1 locale was to allow scm_to/from_locale_string to convert a scheme string with values from 0 to 255 to an 8-bit binary C string. The regexp.test runs on arbitrary binary data which wasn't a problem in guile-1.8 since scm_to/from_locale_string did no real locale conversion. I could fix the test by testing only characters 0 to 127 in a C locale if a Latin-1 locale can't be found. I can also fix the test by using the 'setbinary' function to force the encodings on stdin and stdout to a default value that will pass through binary data, instead of calling 'setlocale'. The procedure 'setbinary' was always a hack, and I kind of want to get rid of it, but, this is why it was created. I looked in the POSIX spec on Regex for specific advice using 128-255 in regex in the C locale. I didn't see anything offhand. The spec does spend a lot of time talking about the interaction between the locale and regular expressions. I get the impression from the spec that using regex on 128-255 in the C locale is an unexpected use of regular expressions. Thanks, Mike