Re: ICU - uneasy feeling

2005-10-14 Thread Angus Leeming
Jean-Marc Lasgouttes wrote: > Lars> Yes. But this is more on the input side and display side of > Lars> things. For storage we will have to support combining chars > even Lars> for european languages, but we don't have to do that as > step Lars> one. > > By combining chars you mean letter+accent,

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
John Levon <[EMAIL PROTECTED]> writes: | Baby steps is definitely fine, but we should at least /aim/ to get this | stuff done in the first attempt... agree. | > Finishing this step and removing all code now rendered cruft, might | > also give us a better position to move forward with combining |

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: | | Lars> Yes. But this is more on the input side and display side of | Lars> things. For storage we will have to support combining chars even | Lars> for european languages, but we

Re: ICU - uneasy feeling

2005-10-14 Thread John Levon
On Fri, Oct 14, 2005 at 02:58:01AM +0200, Lars Gullik Bj?nnes wrote: > | This seems a horribly euro-centric point of view. (Says the guy who can > | only speak one language...) > > I agree with that, but I also agree with Asger, that this won't be > much worse than what we have right now. And we

Re: ICU - uneasy feeling

2005-10-14 Thread Jean-Marc Lasgouttes
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> Yes. But this is more on the input side and display side of Lars> things. For storage we will have to support combining chars even Lars> for european languages, but we don't have to do that as step Lars> one. By combining chars

Re: ICU - uneasy feeling

2005-10-14 Thread Jean-Marc Lasgouttes
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> I'll use it for inspiration... we cannot use it as a starting Lars> point it seems. No, but it points to many the parts of the code that need attention. What might be a goal for 1.5.0 is european languages + hebrew + arabic +

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Angus Leeming <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | > | Very good. How does this compare to the CJK LyX patch? Did you look at | > | it? | > Where can I find the most up to date CJK LyX patch? | | ftp://ftp.u-aizu.ac.jp/pub/tex/cjk-lyx/qt/CJK-LyX-qt-1.3.6-1.patch I'll use it

Re: ICU - uneasy feeling

2005-10-14 Thread Jean-Marc Lasgouttes
> "Angus" == Angus Leeming <[EMAIL PROTECTED]> writes: Angus> Lars Gullik Bjønnes wrote: >> | Very good. How does this compare to the CJK LyX patch? Did you >> look at | it? Where can I find the most up to date CJK LyX patch? Angus> ftp://ftp.u-aizu.ac.jp/pub/tex/cjk-lyx/qt/CJK-LyX-qt-1.3.6-1

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Jose' Matos <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | | > | #LyX 1.3 created this file. For more info see http://www.lyx.org/ | > | \lyxformat 221 | > | | > | is unchanged in UTF-8. We'll just need to read the \lyxformat to | > | ascertain whether the rest of the file is encode

Re: ICU - uneasy feeling

2005-10-14 Thread Angus Leeming
Lars Gullik Bjønnes wrote: > | Very good. How does this compare to the CJK LyX patch? Did you look at > | it? > Where can I find the most up to date CJK LyX patch? ftp://ftp.u-aizu.ac.jp/pub/tex/cjk-lyx/qt/CJK-LyX-qt-1.3.6-1.patch -- Angus

Re: ICU - uneasy feeling

2005-10-14 Thread Jean-Marc Lasgouttes
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | Lars> Very good. How does this compare to the CJK LyX patch? Did you Lars> look at | it? Lars> Where can I find the most up to date CJK LyX patch? Look there maybe: http://www

Re: ICU - uneasy feeling

2005-10-14 Thread Jose' Matos
Lars Gullik Bjønnes wrote: > | #LyX 1.3 created this file. For more info see http://www.lyx.org/ > | \lyxformat 221 > | > | is unchanged in UTF-8. We'll just need to read the \lyxformat to > | ascertain whether the rest of the file is encoded in UTF-8 and, if not, > | use Python's unicode stuff t

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | Very good. How does this compare to the CJK LyX patch? Did you look at | it? Where can I find the most up to date CJK LyX patch? -- Lgb

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Angus Leeming <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | > btw... lyxlex (and the filereading) must be adapted to read utf8, and | > lyx2lyx must do its best to translate the old formats (to utf8)... | | Well, that's trivial because the header Well, that part yes. | | #LyX 1.3

Re: ICU - uneasy feeling

2005-10-14 Thread Angus Leeming
Lars Gullik Bjønnes wrote: > btw... lyxlex (and the filereading) must be adapted to read utf8, and > lyx2lyx must do its best to translate the old formats (to utf8)... Well, that's trivial because the header #LyX 1.3 created this file. For more info see http://www.lyx.org/ \lyxformat 221 is unch

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | > "Jean-Marc" == Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | | > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: | Lars> True. And as I said it should give a much better base to work | Lars> from. | | Jean-Marc> And I fe

Re: ICU - uneasy feeling

2005-10-14 Thread Jean-Marc Lasgouttes
> "Jean-Marc" == Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> True. And as I said it should give a much better base to work Lars> from. Jean-Marc> And I fear that if we look for too much generality at once, Jean-Marc> w

Re: ICU - uneasy feeling

2005-10-14 Thread Jean-Marc Lasgouttes
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> True. And as I said it should give a much better base to work Lars> from. And I fear that if we look for too much generality at once, we going to lose track. JMarc

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: | | | Lars> I just did some tests (using libidn and the nice stringprep | Lars> utility functions therein). Just by changeing | Lars> Paragraph::value_type to uint32_t and adding |

Re: ICU - uneasy feeling

2005-10-14 Thread Jean-Marc Lasgouttes
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> I just did some tests (using libidn and the nice stringprep Lars> utility functions therein). Just by changeing Lars> Paragraph::value_type to uint32_t and adding Lars> stringprep_ucs4_to_utf8 on output, and some Lars> strigpre

Re: ICU - uneasy feeling

2005-10-14 Thread Lars Gullik Bjønnes
Angus Leeming <[EMAIL PROTECTED]> writes: | > Also to make some of this nicer I think we need my "any-patch", I'll | > dig that out of the closet (right when 1.4.0 is released...) | > (we pass a keysym from the frontend... this is turned into a | > std::string and sent to dispatch()... we loose in

Re: ICU - uneasy feeling

2005-10-14 Thread Angus Leeming
Lars Gullik Bjønnes wrote: > I fear that XForms might need an upgrade to use either XwcLookup or > XmbLookup to give us what we require in the keyhandler (we might we > able to do it the event handler as well, unless xforms already ate > some of our multibyte chars). Or some IM thingie (More work f

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
John Levon <[EMAIL PROTECTED]> writes: | On Fri, Oct 14, 2005 at 12:41:32AM +0100, Angus Leeming wrote: | | > John Levon wrote: | > > This seems a horribly euro-centric point of view. (Says the guy who can | > > only speak one language...) | > | > Really? Which one? | | Northern. At least you

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
John Levon <[EMAIL PROTECTED]> writes: | > languages if volunteers come and help out. Don't worry about composed | > Unicode glyphs for now - it's a corner case that can be handled once | > someone feels the heat (which will probably when hell freezes over AFAICT). | | This seems a horribly eur

Re: ICU - uneasy feeling

2005-10-13 Thread John Levon
On Fri, Oct 14, 2005 at 12:41:32AM +0100, Angus Leeming wrote: > John Levon wrote: > > This seems a horribly euro-centric point of view. (Says the guy who can > > only speak one language...) > > Really? Which one? Northern. john

Re: ICU - uneasy feeling

2005-10-13 Thread Angus Leeming
John Levon wrote: > This seems a horribly euro-centric point of view. (Says the guy who can > only speak one language...) Really? Which one?

Re: ICU - uneasy feeling

2005-10-13 Thread John Levon
On Thu, Oct 13, 2005 at 10:29:30PM +0200, Asger Ottar Alstrup wrote: > The reason I suggest a unicode inset is that we already have it: the > latex accent inset. Our inset infrastructure is not in a position to accomodate something like this. > languages if volunteers come and help out. Don't w

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
Asger Ottar Alstrup <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | > No. I am not sure... but it depends... a combining character can be | > used to produce accents as well... why not an umlaut on top of an | > grave on top of an 'e'. | | The reason I suggest a unicode inset is that w

Re: ICU - uneasy feeling

2005-10-13 Thread Asger Ottar Alstrup
Lars Gullik Bjønnes wrote: No. I am not sure... but it depends... a combining character can be used to produce accents as well... why not an umlaut on top of an grave on top of an 'e'. The reason I suggest a unicode inset is that we already have it: the latex accent inset. Of course you can

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | Lars> Ad. Asgers idea of a class UnicodeGlyph... (I'd prefere it to | Lars> not be an inset), we could have all chars in a Paragraph have | Lars> that type. Internally we could use some tricks to not use too | Lars> much memory for glyphs that doe

Re: ICU - uneasy feeling

2005-10-13 Thread Jean-Marc Lasgouttes
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> I am not saying that we must support everything Unicode can in Lars> 1.5, but we must at least think about this. Lars> We might decide that we don't have to worry about combining Lars> chars at all (but I fear that we have to).

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: | | Lars> fonts deal with glyphs (or rather the display engine), we must | Lars> deal with codepoints all the grit (which surely a lib like ICU | Lars> can help us with) | | Could y

Re: ICU - uneasy feeling

2005-10-13 Thread Jean-Marc Lasgouttes
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes: Lars> fonts deal with glyphs (or rather the display engine), we must Lars> deal with codepoints all the grit (which surely a lib like ICU Lars> can help us with) Could you tell me succinctly what a codepoint is? Also, what languages

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
Angus Leeming <[EMAIL PROTECTED]> writes: | Asger Alstrup wrote: | >> | Lars Gullik Bjønnes wrote: | >> | Sure. But that's not information needed by the CORE, is it? The core | >> | does act on (strings of) single codepoints. All paragraph breaking | >> | etc, acts on single code points. | >> | >

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
Asger Alstrup <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | > Angus Leeming <[EMAIL PROTECTED]> writes: | > | Lars Gullik Bjønnes wrote: | > | Sure. But that's not information needed by the CORE, is it? The core does | > | act on (strings of) single codepoints. All paragraph breaking

Re: ICU - uneasy feeling

2005-10-13 Thread Martin Vermeer
On Thu, 2005-10-13 at 01:52 +0200, Lars Gullik Bjønnes wrote: > I have been trying to look at the ICU api, but I find the > documentation utterly confusing and hard to get a clear understanding > on how it works. (Probably caused be me not finding a "Hello World" > code snippet) > > Also, I must s

Re: ICU - uneasy feeling

2005-10-13 Thread Angus Leeming
Asger Alstrup wrote: >> | Lars Gullik Bjønnes wrote: >> | Sure. But that's not information needed by the CORE, is it? The core >> | does act on (strings of) single codepoints. All paragraph breaking >> | etc, acts on single code points. >> >> How can it? When perhapsh three codepoints ends up beei

Re: ICU - uneasy feeling

2005-10-13 Thread Asger Alstrup
Lars Gullik Bjønnes wrote: Angus Leeming <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | Sure. But that's not information needed by the CORE, is it? The core does | act on (strings of) single codepoints. All paragraph breaking etc, acts on | single code points. How can it? When perha

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
Angus Leeming <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | Thanks for supplying the bigger picture. I've only one point to make: | | > (Even UCS-4 is not "one-codepoint" "one-glyph", combining chars are | > required for proper display) | | Sure. But that's not information needed by

Re: ICU - uneasy feeling

2005-10-13 Thread Angus Leeming
Lars Gullik Bjønnes wrote: Thanks for supplying the bigger picture. I've only one point to make: > (Even UCS-4 is not "one-codepoint" "one-glyph", combining chars are > required for proper display) Sure. But that's not information needed by the CORE, is it? The core does act on (strings of) singl

Re: ICU - uneasy feeling

2005-10-13 Thread Asger Alstrup
Angus Leeming wrote: Lars Gullik Bjønnes wrote: Would be nice if some of you could have a look at this lib as well, and see what you think of it. I know it is _The_ Unicode lib to use, but still... I agree that ICU is bloated, complicated and antiqued. I'm not sure there is anything better o

Re: ICU - uneasy feeling

2005-10-13 Thread Lars Gullik Bjønnes
Angus Leeming <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | > I have been trying to look at the ICU api, but I find the | > documentation utterly confusing and hard to get a clear understanding | > on how it works. (Probably caused be me not finding a "Hello World" | > code snippet) |

Re: ICU - uneasy feeling

2005-10-12 Thread Angus Leeming
Lars Gullik Bjønnes wrote: > I have been trying to look at the ICU api, but I find the > documentation utterly confusing and hard to get a clear understanding > on how it works. (Probably caused be me not finding a "Hello World" > code snippet) > > Also, I must say, some of this is based on really

ICU - uneasy feeling

2005-10-12 Thread Lars Gullik Bjønnes
I have been trying to look at the ICU api, but I find the documentation utterly confusing and hard to get a clear understanding on how it works. (Probably caused be me not finding a "Hello World" code snippet) Also, I must say, some of this is based on really old (before 2000) ideas on how to wri