unicode status

2009-09-06 Thread Andy Wingo
Hey Mike,

Would you mind posting to the list a "state of unicode & guile" summary?
I'm very excited about finally being able to say "Guile does unicode",
and was wondering what was left to do :)

Andy
-- 
http://wingolog.org/




Re: compiling with -DSCM_DEBUG=1

2009-09-06 Thread Andy Wingo
On Sun 06 Sep 2009 02:37, Ken Raeburn  writes:

> BTW, the bdw-gc branch with my patch and SCM_DEBUG==1 still fails tests
> on my Mac.
>
> In guardians.c, line 169, SCM_CAR is applied to a non-pair:
>
> Running popen.test
> Running ports.test
>
> scm_error_pair_access
> Non-pair accessed with SCM_C[AD]R: `ERROR: In procedure symbol->string:
> ERROR: Wrong type argument in position 1 (expecting symbol): # 124e300 (reachable: 15 unreachable: 1)>
> FAIL: check-guile
>
> I use a modified scm_error_pair_access() that prints the function's name
> (as seen above) and then sleeps a while, so I can attach gdb and  get
> this stack trace before letting it resume:
>
> #0  0x9487546e in __semwait_signal ()
> #1  0x948752ef in nanosleep$UNIX2003 ()
> #2  0x948cae71 in sleep$UNIX2003 ()
> #3  0x002ea943 in scm_error_pair_access (non_pair=0x11d9180) at ../../
> libguile/pairs.c:50
> #4  0x002c8e35 in finalize_guarded (ptr=0x11d91f0,
> finalizer_data=0x11d9188) at ../../libguile/guardians.c:169
> #5  0x000a0866 in GC_invoke_finalizers ()
> #6  0x000a2ce9 in GC_generic_malloc_many ()
> #7  0x000ab01a in GC_malloc ()
> #8  0x002ea987 in scm_cell [inlined] () at inline.h:66
> #9  0x002ea987 in scm_cons (x=0x9487546e, y=0x9487546e) at ../../
> libguile/pairs.c:66
> #10 0x002d0f5a in scm_make_list (n=0x4, init=0xb2920) at ../../
> libguile/list.c:121
> #11 0x002c8bbc in finalize_guarded (ptr=0x12615e8,
> finalizer_data=0x12615a0) at ../../libguile/guardians.c:125
> #12 0x000a0866 in GC_invoke_finalizers ()

Thanks for catching this bug. I'm sure Ludovic will look into it. We
should not release a 1.9.3 with bugs like this one.

A
-- 
http://wingolog.org/




Re: unicode status

2009-09-06 Thread Mike Gran
On Sun, 2009-09-06 at 12:45 +0200, Andy Wingo wrote:
> Hey Mike,
> 
> Would you mind posting to the list a "state of unicode & guile" summary?
> I'm very excited about finally being able to say "Guile does unicode",
> and was wondering what was left to do :)
> 
> Andy

OK.  

First, here's the stuff I've already put in NEWS

** Characters

Characters can take the whole Unicode range.  char-upcase and
char-downcase use default Unicode casing rules. Character comparisons
such as chardouble and locale-string->int.  Bruno has
some suggestions on how to do that at
http://savannah.gnu.org/support/?106998

* I haven't done any testing on readline or gettext

* Unicode-capable regex has not been implemented.  Libunistring might do
this someday.  Until then, there will probably have to be the hack where
strings are converted to UTF-8 encoding to pass through regex.  This
doesn't get you Unicode regex, but, it keeps non-ASCII from being
mangled by regex.

* EMACS has a lot of aliases that can be use in the "-*- coding: X
-*-" line, like latin-1, that aren't valid encoding names.  The reader
should be modified to understand the common ones.

* The whole issue of R6RS compliance will have to be dealt with some
day.  For example, I went with \xHH \u and \UHH escapes because
they were backwards compatible with the \xHH we already had.  R6RS uses
a variable length hex escape terminated by a semicolon: \xHH; \xHHH;.
These are not backward compatible.  There are some R6RS functions that
are missing: string-foldcase, string normalization routines.

Also, R6RS and R5RS seem to disagree on the definition of string-upcase
et al.  R6RS is clear that the result of string-upcase can have more
letters that its input, and it gets rid of string-upcase! for the same
reason.

That's all I remember off the top of my head.

Thanks,

Mike







Re: Status Update, Elisp Compiler

2009-09-06 Thread Daniel Kraft

Mark H Weaver wrote:

Daniel Kraft wrote:
4) I've not done anything yet regarding converting '() -> %nil in lists  
that are seen from elisp; I think the final conclusion was that we want  
such a conversion, and so I will do that.  But maybe I can try if it is  
reasonably possible to allow switching it off to regain performance  
without?  Maybe also allow switching off the #f -> %nil conversion for  
booleans (t = #t).


I thought the point of the %nil work was to avoid the need for such
conversions.  My %nil patch (in progress) includes a fast macro for
boolean testing from lisp, which considers (), #f, and %nil to all be
false.


Yes, of course.  I remember we discussed those issues, but think the 
last I heard of was to do, at least partially or in "some cases", a 
translation.  For booleans this is done at the moment (but only those 
that are seen from elisp, i.e. not when some boolean expression is only 
generated by the compiler).  But once again I'm open to discussion on 
this subject, and how to solve that best.


In my opinion, there are two sides to look at seperately:  One is from 
elisp to scheme, i.e. that (if nil 1 2) can be compiled directly to a 
guile internal if construct without need to add a elisp->scheme boolean 
translation.  That's what your patch will do and what's in any case a 
good idea (in my opinion).


The other side is changing back booleans from Guile internals (like = or 
others) that are seen directly from elisp.  I.e., that (= 1 2) is 
wrapped into a translator so it returns nil for false instead of #f. 
Here, for booleans translation is done at the moment, but for 
end-of-lists it is not.  I've no real opinion on if we should do 
translation here or not...



If it is common to test for %nil using eq, then we could provide a
special eq which treats (), #f, and %nil as equal, part of the same
equivalence class.


See above, but I think if real-world code does not depend on that and we 
don't strive for "100% compatibility" with emacs/elisp, we could really 
just go without the scheme->elisp translation and try directly.


But as I said, I'd be happy to get other opinions again before I either 
remove the boolean translation or add one for lists.


Thanks a lot for your comments and thoughts, though!  And your patch of 
course ;)


Yours,
Daniel


--
Done:  Arc-Bar-Cav-Ran-Rog-Sam-Tou-Val-Wiz
To go: Hea-Kni-Mon-Pri