Re: Wide strings status

2009-04-22 Thread Ludovic Courtès
Hello! Mike Gran writes: > On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote: >> You seem to imply that `scm_getc ()' will now return a Unicode >> codepoint, is that right? What about `scm_c_{read,write} ()', and >> `scm_{get,put}s ()'? >> > > I vacillate on this, but, I think the most

Re: Wide strings status

2009-04-21 Thread Mike Gran
On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote: > > This is all going to be slower than before because of the string > > conversion operations, but, I didn't want to do any premature > > optimization. First, I wanted to get it working, but, there is plenty > > of room for optimization l

Re: Wide strings status

2009-04-21 Thread Ludovic Courtès
Hello! Mike Gran writes: > Strings are internally encoded either as "narrow" 8-bit ISO-8859-1 > strings or as "wide" UTF-32 strings. Strings are usually created as > narrow strings. Narrow strings get automatically widened to wide > strings if non-8-bit characters are set! or appended to them.

Re: Wide strings

2009-01-29 Thread Neil Jerram
l...@gnu.org (Ludovic Courtès) writes: >>> Do we need to talk more about what needs to be accomplished? Do we >>> need a complete specification? Do we need a vote on if it is a good >>> idea? >> >> I think you're going in the right direction. More importantly, although >> I can't speak for them, N

Re: Wide strings

2009-01-28 Thread Ludovic Courtès
Hi, Andy Wingo writes: > On Wed 28 Jan 2009 17:44, Mike Gran writes: > >> Since I need this functionality taken care of, and since I have some >> time to play with it, what's the procedure here? > > The best thing IMO would be to hack on it on a Git branch, with small > and correct patches. We

Re: Wide strings

2009-01-28 Thread Ludovic Courtès
Hello, Clinton Ebadi writes: > The `scm_{to|from}_locale_string' functions provide enough abstraction > to make this doable without breaking anything that doesn't use > `scm_take_locale_string' (and even then Guile can detect when the locale > is not UCS-4, revert to `scm_from_locale_string' and

Re: Wide strings

2009-01-28 Thread Clinton Ebadi
Mike Gran writes: > Hi, > > Let's say that one possible goal is to add wide strings > * using Gnulib functions > * with minimal changes to the public Guile API > * where chars become 4-byte codepoints and strings are internally > either UTF-32 or ISO-8859-1 > > Since I need this functionalit

Re: Wide strings

2009-01-28 Thread Andy Wingo
Hi, On Wed 28 Jan 2009 17:44, Mike Gran writes: > Since I need this functionality taken care of, and since I have some > time to play with it, what's the procedure here? The best thing IMO would be to hack on it on a Git branch, with small and correct patches. We could get you commit access if

Re: Wide strings

2009-01-28 Thread Mike Gran
Hi, Let's say that one possible goal is to add wide strings * using Gnulib functions * with minimal changes to the public Guile API * where chars become 4-byte codepoints and strings are internally either UTF-32 or ISO-8859-1 Since I need this functionality taken care of, and since I have so

Re: Wide strings

2009-01-27 Thread Ludovic Courtès
Hi! Mike Gran writes: > Gnulib works for me. Bruno is the maintainer of those funcs, so I'm > sure they work great. Good! > So really the first questions to answer are the encoding question and > whether the R6RS string API is the goal. SRFI-1[34] (i.e., status quo in terms of supported AP

Re: Wide strings

2009-01-27 Thread Andy Wingo
On Tue 27 Jan 2009 06:52, Mike Gran writes: > I said > >> (Though, such a scheme would force scm_take_locale_string to become >> scm_take_iso88591_string.) > > which is incorrect. Under the proposed scheme, scm_take_locale_string > would only be able to use that storage directly if it happened

Re: Wide strings

2009-01-26 Thread Mike Gran
I said > (Though, such a scheme would force scm_take_locale_string to become > scm_take_iso88591_string.) which is incorrect. Under the proposed scheme, scm_take_locale_string would only be able to use that storage directly if it happened to be ASCII or 8859-1.

Re: Wide strings

2009-01-26 Thread Mike Gran
Hello, > Ludo' sez >> Mike Gran writes: > BTW, Gnulib has a wealth of modules that could be helpful here: > http://www.gnu.org/software/gnulib/MODULES.html#posix_ext_unicode > I used a few of them in Guile-R6RS-Libs to implement `string->utf8' > and such like. The Gnulib routines seem perfe

Re: Wide strings

2009-01-26 Thread Ludovic Courtès
Hello, Mike Gran writes: > There are 3 good, actively developed solutions of which I am aware. > > 1.  Use GNU libc functionality.  Encode wide strings as wchar_t. That'd be POSIX functionality, actually. > 2.  Use GLib functionality.  Encode wide strings as UTF-8.  Possibly > give up on O(1).

Re: Wide strings

2009-01-26 Thread Ludovic Courtès
Hello! Neil Jerram writes: > But what about the other possible debate, about the API? Are you > thinking that we should accept R6RS's choice? No, I think we have SRFI-1[34] to start with, both of which are well defined in the context of Unicode. > (I really haven't read up on all this enough

Re: Wide strings

2009-01-26 Thread Mike Gran
> > Ludo sez, > Mike sez, > > 1. IMO it'd be nice to have ASCII strings special-cased so that they > >are always encoded in ASCII. This would allow for memory savings > >since, e.g., most symbols are expected to contain only ASCII > >characters. It might also simplify interaction wit

Re: Wide strings

2009-01-25 Thread Mike Gran
> From: Ludovic Courtès l...@gnu.org I believe that we should aim for R6RS strings. I think the most important thing is to have humility in the face of an impossible problem: how to encode all textual information.  It is important to "stand on the shoulders of giants" here.  It becomes a matter o

Re: Wide strings

2009-01-25 Thread Neil Jerram
2009/1/25 Ludovic Courtès : > > I agree it would be really nice to have Unicode support, but I'm not > aware of any "plan", so please go ahead! :-) Indeed. > A few considerations regarding the inevitable debate about the internal > string representation: [...] But what about the other possible

Re: Wide strings

2009-01-25 Thread Ludovic Courtès
Hello! Mike Gran writes: > Hi.  I know there has been a lot of talk about wide characters and > Unicode over the years.  I'd like to see it happen because how the are > implemented will determine the future of a couple of my side-projects. > I could pitch in, if you needed some help. Indeed, it