Re: Unicode ports patch
Hello! Mike Gran writes: > The latest commit 'Add full Unicode capability to ports and the default > reader' 889975e51accb80491af76fc5db980aeb3edd342 adds the majority of > the functionality for non-ASCII strings. This patch adds a few functions related to string ports: * libguile/strports.c: store string ports in locale encoding (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector) (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector): new functions I think it would be nicer if these used bytevectors instead of u8vectors and were locale-independent (which would match the `string->utf8' & co. API). Also I would make `scm_strport_to_locale_u8vector ()' private. And finally, it'd be even better if it were documented in the manual. :-) Actually I'm not convinced that `call-with-output-locale-*' and `open-input-locale-*' are useful, precisely because we can use a string port to get a string and then `string->utf8' to get at the string bits. What do you think? Thanks, Ludo'.
Re: [BDW-GC] "Inlined" storage; `scm_take_' functions
Hi, Mike Gran writes: > On Tue, 2009-09-01 at 02:14 +0200, Ludovic Courtès wrote: [...] >> The `scm_take_' functions for strings/symbols/bytevectors are now >> essentially aliases to the corresponding `scm_from_' because we cannot >> advantageously reuse the provided storage. >> >> Should these functions be deprecated or discouraged? >> > > codesearch.google.com says that scm_take_ isn't often used by other > projects, but, it is used by lilypond. I think that's reason enough to > leave it in. I'd vote for keeping them and adjusting the docs to say > something like > > Like `scm_from_locale_string' and `scm_from_locale_stringn', > respectively, but also immediately frees STR after creating > the Guile string. > > Or something like that. Of course, I meant "keep them but possibly moved into {discouraged,deprecated}.c". Your doc suggestion looks good to me also. Thanks, Ludo'.
Re: Unicode ports patch
Hi, On Tue 01 Sep 2009 10:19, l...@gnu.org (Ludovic Courtès) writes: > Mike Gran writes: > >> The latest commit 'Add full Unicode capability to ports and the default >> reader' 889975e51accb80491af76fc5db980aeb3edd342 adds the majority of >> the functionality for non-ASCII strings. > > This patch adds a few functions related to string ports: > > * libguile/strports.c: store string ports in locale encoding > (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector) > (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector): > new functions > > I think it would be nicer if these used bytevectors instead of u8vectors > and were locale-independent (which would match the `string->utf8' & > co. API). Also I would make `scm_strport_to_locale_u8vector ()' > private. And finally, it'd be even better if it were documented in the > manual. :-) > > Actually I'm not convinced that `call-with-output-locale-*' and > `open-input-locale-*' are useful, precisely because we can use a string > port to get a string and then `string->utf8' to get at the string bits. FWIW, I think I agree with all of Ludovic's comments; though if there is a way that we can simply arrange to output bytes to an R6RS binary output port, I think there are already efficient means to collect the bytes from such a port in a bytevector. Cheers, Andy -- http://wingolog.org/
Re: Unicode ports patch
- Original Message > From: Andy Wingo > To: Ludovic Courtès > Cc: guile-devel@gnu.org > Sent: Tuesday, September 1, 2009 11:25:26 AM > Subject: Re: Unicode ports patch > > Hi, > > On Tue 01 Sep 2009 10:19, l...@gnu.org (Ludovic Courtès) writes: > > > Mike Gran writes: > > > >> The latest commit 'Add full Unicode capability to ports and the default > >> reader' 889975e51accb80491af76fc5db980aeb3edd342 adds the majority of > >> the functionality for non-ASCII strings. > > > > This patch adds a few functions related to string ports: > > > > * libguile/strports.c: store string ports in locale encoding > > (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector) > > (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector): > > new functions > > > > I think it would be nicer if these used bytevectors instead of u8vectors > > and were locale-independent (which would match the `string->utf8' & > > co. API). Also I would make `scm_strport_to_locale_u8vector ()' > > private. And finally, it'd be even better if it were documented in the > > manual. :-) I don't understand. "it would be nicer if *these* ..." To what does *these* refer: string ports? It would be nicer if we replace string ports with bytevector ports? Or it would be nicer if scm_get_output_locale_u8vector was scm_get_output_bytevector? "it would be nicer if these used bytevectors ... and were *locale-independent*" It would be nicer if string ports were actually bytevector ports, and that they were locale-independent? Or that scm_get_output_bytevector returned a locale-independent (ergo 8-bit or 32-bit) vector? > > > > Actually I'm not convinced that `call-with-output-locale-*' and > > `open-input-locale-*' are useful, precisely because we can use a string > > port to get a string and then `string->utf8' to get at the string bits. "We can use a string port to get a string" If we write to a string port and pop a result string? "And then use string->utf8 to get at the string bits" And then convert the result string to a UTF-8 encoded bytevector? > > FWIW, I think I agree with all of Ludovic's comments; though if there is > a way that we can simply arrange to output bytes to an R6RS binary > output port, I think there are already efficient means to collect the > bytes from such a port in a bytevector. Thanks, Mike
Re: Unicode ports patch
Hi! Mike Gran writes: >> On Tue 01 Sep 2009 10:19, l...@gnu.org (Ludovic Courtès) writes: >> >> > Mike Gran writes: >> > >> >> The latest commit 'Add full Unicode capability to ports and the default >> >> reader' 889975e51accb80491af76fc5db980aeb3edd342 adds the majority of >> >> the functionality for non-ASCII strings. >> > >> > This patch adds a few functions related to string ports: >> > >> > * libguile/strports.c: store string ports in locale encoding >> > (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector) >> > (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector): >> > new functions >> > >> > I think it would be nicer if these used bytevectors instead of u8vectors >> > and were locale-independent (which would match the `string->utf8' & >> > co. API). Also I would make `scm_strport_to_locale_u8vector ()' >> > private. And finally, it'd be even better if it were documented in the >> > manual. :-) > > I don't understand. "it would be nicer if *these* ..." "These" was for "these functions". > "it would be nicer if these used bytevectors ... and were > *locale-independent*" > > It would be nicer if string ports were actually bytevector ports, and that > they were locale-independent? Or that scm_get_output_bytevector returned a > locale-independent (ergo 8-bit or 32-bit) vector? The latter. >> > Actually I'm not convinced that `call-with-output-locale-*' and >> > `open-input-locale-*' are useful, precisely because we can use a string >> > port to get a string and then `string->utf8' to get at the string bits. > > "We can use a string port to get a string" > > If we write to a string port and pop a result string? Yes, with `with-output-to-string' for instance. > "And then use string->utf8 to get at the string bits" > > And then convert the result string to a UTF-8 encoded bytevector? `string->utf8' returns a bytevector containing the UTF-8-encoded string it is passed. Thanks, Ludo'.
Re: compiling with -DSCM_DEBUG=1
Hi! Ken Raeburn writes: > --- a/libguile/gc.h > +++ b/libguile/gc.h > @@ -248,7 +248,7 @@ SCM_INTERNAL void scm_i_ensure_marking(void); > SCM_API int scm_debug_cell_accesses_p; > SCM_API int scm_expensive_debug_cell_accesses_p; > SCM_API int scm_debug_cells_gc_interval ; > -void scm_i_expensive_validation_check (SCM cell); > +SCM_API void scm_i_expensive_validation_check (SCM cell); > #endif > > SCM_INTERNAL scm_i_pthread_mutex_t scm_i_gc_admin_mutex; Please apply this one. I guess Guile is seldom compiled with `SCM_DEBUG' defined. Thanks, Ludo'.
Re: more compilation failures: -DSCM_DEBUG_TYPING_STRICTNESS=2
Hi Ken, Ken Raeburn writes: > Compiling with SCM_DEBUG_TYPING_STRICTNESS=2 as discussed in __scm.h Another compilation flag that must be rarely used. :-) Do you find it useful? > It also means constant values for static initializers ("{ { BITS } }") > have a different form from run-time expressions generating certain > values ("scm_pack (BITS)" calls an inline function), and comparisons > can't be done with "==" and "!=". (In fact, tags.h already says "SCM > values can not be compared by using the operator ==", right above the > definition of scm_is_eq.) > > Guess what we're also doing? :-) > And I haven't even tried compiling Ludovic's bdw-gc-static-alloc > branch yet, just master. Indeed, we're in trouble. > #1: We continue to not support static initialization. [...] > #1a: Extend #1 later with whatever internal macros are needed to > provide the right initialization syntax for constructs used in bdw-gc- > static-alloc based on the STRICTNESS setting. > > #1b: Try to supplement #1 with changes to SCM_PACK or SCM_MAKIFLAG to > make it not considered a compile-time constant even with STRICTNESS<2 > and thus SCM_UNSPECIFIED, SCM_BOOL_F, etc are never suitable for > static initialization, catching this problem earlier in the future. [...] > #1c: Try to supplement #1 by defaulting to STRICTNESS=2 on platforms > where the union is passed and returned the same way as the pointer or > integer in function calls [...] > #2: Drop STRICTNESS=2 support and really support static initialization > with the current macros. > > #3: Keep STRICTNESS=2 support, and support static initialization, even > for application code, with a bunch of new macros. My preference is for #2 because: (1) I've never used it ;-), and (2) we're moving away from C anyway. Hmm, weak arguments maybe. Anyway, in the meantime, we can conditionalize static initialization stuff from bdw-gc-static-alloc on STRICTNESS == 0 and keep everyone happy. Does that sound reasonable? > It looks like the eval code is going to be annoying too I wouldn't worry much about this one either as its probably doomed, once Andy's eval cleanup work is mature. Things have been moving too fast lately! Thanks, Ludo'.
Re: Unicode ports patch
> > It would be nicer if string ports were actually bytevector ports, and that > > they were locale-independent? Or that scm_get_output_bytevector returned a > > locale-independent (ergo 8-bit or 32-bit) vector? > > The latter. The test suite requires an API for testing the correctness of the encoding when writing or displaying a string in a given locale. It also needs an API for checking that a locale-encoded byte-array can be correctly converted to a string. What would you suggest? Thanks, Mike
Re: Minor queries about Unicode char docs
Mike Gran writes: >> For this context I think it would be clearer to say >> >> Return `#t' iff the Unicode code point of `x' is less than the >> code point of `y', else `#f'. > > Sounds good. [..] > I see what you mean. The text should have something like... > > "In case folding comparisons, if a character is lowercase and has an > uppercase form that can be expressed as a single character, its > uppercase form is used in the comparison. All other characters are not > modified for the comparison. Note that the German letter Sharp S > (Eszett) is not uppercased before the comparison since its plural has > two characters instead of one." > I meant to say 'its _uppercase form_ has two characters instead of one'. Thanks, those changes sound great. Are you happy to commit them sometime? Neil
Re: truth of %nil
Mark H Weaver writes: > On Sun, Aug 30, 2009 at 12:13:59PM +0100, Neil Jerram wrote: >> Mark H Weaver writes: >> >> > This numbering has the nice properties that 0 is #f. >> >> Just to be clear: will this mean that (SCM_BOOL_F == 0) ? As things >> stand I don't think it will, because SCM_MAKIFLAG shifts and adds >> 0x04. > > Yes, that's correct. SCM_BOOL_F is 4. What I should have said above > is that #f is IFLAG number 0. Thanks for clarifying that. (And from other threads it seems clear now that SCM_BOOL_F == 0 would actually be a problem!) Neil
more compilation failures: -DSCM_DEBUG_TYPING_STRICTNESS=2
Compiling with SCM_DEBUG_TYPING_STRICTNESS=2 causes SCM to be defined as a union type (though the comments say a struct type), which enhances the type checking by making random conversions and casts to and from pointer and integer types not work without going through the correct conversion macros/functions. Problem is, we're doing some of those. It also means constant values for static initializers ("{ { BITS } }") have a different form from run-time expressions generating certain values ("scm_pack (BITS)" calls an inline function), and comparisons can't be done with "==" and "!=". (In fact, tags.h already says "SCM values can not be compared by using the operator ==", right above the definition of scm_is_eq.) Guess what we're also doing? :-) And I haven't even tried compiling Ludovic's bdw-gc-static-alloc branch yet, just master. I can clean some of this up trivially -- SCM_PACK/SCM_UNPACK as needed, change == to scm_is_eq. The initializers make it slightly less trivial, and I can imagine different courses of action. #1: We continue to not support static initialization. Move most of the initializations in the library to the per-file init functions, and for stuff like the ra_iproc tables in array-map.c we may want *one* internal initializer macro (SCM_I_UNSPECIFIED_INIT or SCM_I_UNDEFINED_INIT? maybe even something zero-valued) for filling in slots in static structures without getting compiler warnings about missing initializers. #1a: Extend #1 later with whatever internal macros are needed to provide the right initialization syntax for constructs used in bdw-gc- static-alloc based on the STRICTNESS setting. #1b: Try to supplement #1 with changes to SCM_PACK or SCM_MAKIFLAG to make it not considered a compile-time constant even with STRICTNESS<2 and thus SCM_UNSPECIFIED, SCM_BOOL_F, etc are never suitable for static initialization, catching this problem earlier in the future. I believe a use of a comma expression will suffice, but finding a form that doesn't generate compiler warnings and doesn't generate run-time code could be tricky. (Though, it becomes easier if we require only no performance impact when optimizing and with ... what, inline function support? gcc?) #1c: Try to supplement #1 by defaulting to STRICTNESS=2 on platforms where the union is passed and returned the same way as the pointer or integer in function calls, and where there isn't a significant performance impact. Probably selected via cpp macros in __scm.h, since an autoconf feature test would be difficult at best, and still specific to the compiler used for building libguile and not the one used to build the application. This helps us avoid the "==" and random casting part of the problem better in the future. Mac OS X (10.5, Intel) seems to use the same calling convention both ways in one simple test, though I haven't tried performance testing. #2: Drop STRICTNESS=2 support and really support static initialization with the current macros. #3: Keep STRICTNESS=2 support, and support static initialization, even for application code, with a bunch of new macros. Thoughts? My preference is for #1 now, and #1a/b/c when convenient or needed. Ken
Re: truth of %nil
Mark H Weaver writes: > What about scm_is_bool? I'm tempted to suggest that it should work > the same way as "boolean?" within scheme, whatever that may be. I > tend to think they ought to treat %nil as boolean, though I'm less > sure of this than about scm_is_true/false/null. It's the right thing > for type-checking an argument that is expected to be boolean, which > seems to be fairly common in guile. More complex code that is > dispatching on type (such as the aforementioned GOOPS code) will in > general have to be fixed to take into account that %nil is both a > boolean and a list. I agree (i.e. I think scm_is_bool (SCM_LISP_NIL) should be 1). > One more thing: scheme code can reasonably expect to "write" a list of > simple values and then "read" it back in. But now, lists might be > terminated by %nil instead of '(). Therefore, I think "read" needs to > be able to read SCM_LISP_NIL in whatever form we "write" it in. I'll > let someone more knowledgable about guile reader issues decide what > that form should be. Currently we write it as "#nil". Interesting point, but seems like one that could be left until it crops up for real somewhere. I assume the mainline case of writing a proper list will be fine, because a list like (a b c . #nil) will be written out as "(a b c)" - right? Then, when read in again, it would become (a b c . ()) - I think we may have to wait for real cases to know if that's actually a problem at all. > Yes, I've also given this some thought. If we were using C++ (I'm > very glad we're not, btw!) then I'm pretty sure we could use the type > system to mark certain functions as never returning %nil, and then > arrange to optimize away the %nil checks in those cases, but I can't > think of a way to do it with C, even with GCC's extensions. Maybe, if > we can develop a reasonable proposal, we can get sufficient > functionality added to GCC. I was actually meaning the VM compiler... but yes, maybe there are also C things we could do. >> So, if you would be happy to do so, can I suggest that you rework your >> patches so that they also make (and then assume, obviously) the >> scm_is_false/true/bool/null change, and incorporate my other comments? > > I will gladly do so. Fantastic, thanks (and also for your 'Yes's to the other add-on pieces)! > Also, I signed my copyright assignment papers a while ago, and the > relevant file on fencepost has been updated accordingly. Yes, indeed; we (maintainers) got notified about that at the time; apologies for not closing the loop with you then. Regards, Neil
Re: [PATCH] %nil-handling optimization and fixes v1
Mark H Weaver writes: > I agree that the names are uncomfortably long. We could shorten them > without much loss of clarity by replacing "lisp_nil" with "nil" and > "and_not" with "not", yielding: > > scm_is_false_assume_not_nil scm_is_true_assume_not_nil > scm_is_false_not_nil scm_is_true_or_nil > scm_is_false_or_nil scm_is_true_not_nil > > scm_is_lisp_falsescm_is_lisp_true > > scm_is_null_assume_not_nil > scm_is_null_not_nil > scm_is_null_or_nil > > scm_is_bool_not_nil > scm_is_bool_or_nil > > I can still do this if y'all would prefer the shorter names. FWIW, dropping "lisp_" looks OK, but I'm not sure about dropping "and_". "scm_is_false_not_nil" feels notably harder to understand than "scm_is_false_and_not_nil". > Are there any remaining objections to mapping scm_is_false/true/null > as follows? > > scm_is_null --> scm_is_null_or_lisp_nil > scm_is_false --> scm_is_false_or_lisp_nil > scm_is_true --> scm_is_true_and_not_lisp_nil (Not from me - but I assume you're checking with others...) Neil
Away for a few days
I'm going to be travelling for a few days and so away from email; should be looking at the list again in about a week from now. So, apologies in advance for my non-responsiveness... Neil
Re: trace examples broken in master
Julian Graham writes: >> This should all be fixed in master now. Can you have a go and let me >> know if you still see any problems? > > Just built from HEAD. The errors I reported earlier are gone, but I'm > still not getting any trace output from the `rev' example in the > manual. I'm sorry, I didn't actually try running those examples myself, and I still haven't done that yet now - but just a thought: did you try with ,o interp #t so as to use the evaluator instead of the compiler + VM ? It's absolutely expected that all the trap stuff doesn't work yet with the VM. (It's on my list!) Neil
Re: more compilation failures: -DSCM_DEBUG_TYPING_STRICTNESS=2
On Sep 1, 2009, at 15:47, Ludovic Courtès wrote: Compiling with SCM_DEBUG_TYPING_STRICTNESS=2 as discussed in __scm.h Another compilation flag that must be rarely used. :-) Do you find it useful? Not so far. :-) There seems to be a lot of otherwise correct code making assumptions about using casts or "==" or whatever; I haven't sorted out whether there are actual bugs being flagged in there too. The corresponding code in Emacs has helped me track down code that was being sloppy about mixing integers and Lisp values, and that was good, since I'm trying to change things so that integers represented in Lisp have bit patterns different from the integers themselves, so you really need to make the distinction. But IIRC it took a while to make it work again, because nothing was really using it when I started poking at it. In the Guile case, I'm a tiny bit concerned about some of the pointer/ int games played (e.g., I'm pretty sure C99 does not guarantee that you can convert an arbitrary uintptr_t value to pointer and back and be guaranteed of getting the original value... but I don't know of a platform that actually violates that assumption), but only a tiny bit. My preference is for #2 because: (1) I've never used it ;-), and (2) we're moving away from C anyway. Hmm, weak arguments maybe. Anyway, in the meantime, we can conditionalize static initialization stuff from bdw-gc-static-alloc on STRICTNESS == 0 and keep everyone happy. Does that sound reasonable? Sure. Actually, STRICTNESS=1 is the default -- 0 makes SCM an integer, 1 makes it a pointer to a struct, which adds a little more type safety, and 2 makes it a union, which breaks casting, initialization, etc. It looks like the eval code is going to be annoying too I wouldn't worry much about this one either as its probably doomed, once Andy's eval cleanup work is mature. Things have been moving too fast lately! I'm not going to complain about that! :-) Ken
[BDW-GC] Performance impact of static allocation
Hello! I just updated the `bdw-gc-static-alloc' branch [0]. It statically allocates stringbufs, strings, and subrs defined using the "snarfing macros". It links `libguile' with `-z relro' such that constants needing relocation are placed in a `PT_GNU_RELRO' ELF segment, which is made read-only by the dynamic linker after relocation. Current libgc CVS (pre-7.2) is aware of `PT_GNU_RELRO' [1, 2] and does not scan data contained in this segment (since it cannot possibly contain pointers to heap-allocated objects). This was not the case in 7.1. Thus I compared `bdw-gc-static-alloc' to `boehm-demers-weiser-gc' using the latest libgc. The benchmark is (again) `gcbench.scm'; the baseline is current Guile `master': * no static allocation (boehm-demers-weiser-gc) heap size (MiB) execution time (s.) Guile 53.83 (1.00x) 21.167 (1.00x) BDW-GC, FSD=3 53.22 (0.99x) 15.204 (0.72x) ! BDW-GC, FSD=6 53.11 (0.99x) 15.457 (0.73x) ! BDW-GC, FSD=9 43.62 (0.81x) 16.810 (0.79x) ! BDW-GC, FSD=3 incr. 63.22 (1.17x) 19.716 (0.93x) BDW-GC, FSD=3 gene. 96.42 (1.79x) 18.641 (0.88x) * static allocation, `-z relro' (bdw-gc-static-alloc) heap size (MiB) execution time (s.) Guile 53.83 (1.00x) 20.556 (1.00x) BDW-GC, FSD=3 47.33 (0.88x) 14.391 (0.70x) ! BDW-GC, FSD=6 46.75 (0.87x) 14.914 (0.73x) ! BDW-GC, FSD=9 44.74 (0.83x) 15.130 (0.74x) ! BDW-GC, FSD=3 incr. 82.51 (1.53x) 19.023 (0.93x) BDW-GC, FSD=3 gene. 93.87 (1.74x) 17.147 (0.83x) * static allocation (bdw-gc-static-alloc), *without* `-z relro' heap size (MiB) execution time (s.) Guile 53.83 (1.00x) 21.768 (1.00x) BDW-GC, FSD=3 55.85 (1.04x) 16.156 (0.74x) BDW-GC, FSD=6 55.29 (1.03x) 15.220 (0.70x) BDW-GC, FSD=9 44.74 (0.83x) 16.355 (0.75x) ! BDW-GC, FSD=3 incr.102.09 (1.90x) 20.571 (0.95x) BDW-GC, FSD=3 gene. 89.70 (1.67x) 19.215 (0.88x) (Heap usage in incremental mode seems to be highly variable in this case, sometimes going as high as 128 MiB.) Conclusions --- Libgc's handling of `PT_GNU_RELRO' segments provides a noticeable improvement. With it, static allocation actually improves performance (5%) and leads to reduced heap usage (11%). However, when not linking with `-z relro', static allocation leads to slightly degraded performance and increased heap usage (perhaps due to misidentified pointers in the `.data.rel.ro' section?). This is probably worth some investigation on the BDW-GC side. Thanks, Ludo'. [0] http://git.savannah.gnu.org/gitweb/?p=guile.git;a=commit;h=5f236208d0d864546e59afa0f5a11c9b3ba14b10 [1] http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2570 [2] http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2716 pgpM04qGbEeuO.pgp Description: PGP signature