date:20010605

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Larry Wall <[EMAIL PROTECTED]> writes: > Russ Allbery writes: >> Particularly since extending UTF-8 to more than 31 bits requires >> breaking some of the guarantees that UTF-8 makes, unless I'm missing >> how you're encoding the first byte so as not to give it a value of >> 0xFE. > The UTF-16 BO

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall

Dan Sugalski writes: : At 04:44 PM 6/5/2001 -0700, Larry Wall wrote: : >(Perl 5 extends it all the way to 64-bit values, represented in 13 bytes!) : : I know we can, but is it really a good idea? 32 bits is really stretching : it for character encoding, and 64 seems rather excessive. Such large

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall

Russ Allbery writes: : Particularly since extending UTF-8 to more : than 31 bits requires breaking some of the guarantees that UTF-8 makes, : unless I'm missing how you're encoding the first byte so as not to give it : a value of 0xFE. The UTF-16 BOMs, 0xFEFF and 0xFFFE, both turn out to be illeg

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dan Sugalski

At 07:40 AM 6/5/2001 -0700, Dave Storrs wrote: >On Tue, 5 Jun 2001, Dave Mitchell wrote: > > > dispatch loop. I'd much rather have a 'regex start' opcode which > > calls a separate dispath loop function, and which then interprets any > > further ops in the bytestream as regex ops. That way we doub

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski

At 04:44 PM 6/5/2001 -0700, Larry Wall wrote: >Dan Sugalski writes: >: Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, >: but that's in the Unicode 3.0 standard. > >Doesn't really matter where they install the artificial cap, because >for philosophical reasons Perl is go

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Jarkko Hietaniemi

On Tue, Jun 05, 2001 at 04:44:46PM -0700, Russ Allbery wrote: > NeonEdge <[EMAIL PROTECTED]> writes: > > > This is evident in the "Musical Symbols" and even "Byzantine Musical > > Symbols". Are these character sets more important than the actual > > language character sets being denied to the ot

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 04:44:46PM -0700, Russ Allbery wrote: > In the meantime, the normally-encountered working character set of modern > Asian languages has been in Unicode from the beginning, and currently the > older and rarer characters and the characters used these days only in > proper nam

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Russ Allbery <[EMAIL PROTECTED]> writes: > That's probably unnecessary; I really don't expect them to ever use all > 31 bytes that the IETF-standardized version of UTF-8 supports. 31 bits, rather. *sigh* But given that, modulo some debate over CJKV, we're getting into *really* obscure stuff al

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Larry Wall <[EMAIL PROTECTED]> writes: > Doesn't really matter where they install the artificial cap, because for > philosophical reasons Perl is gonna support larger values anyway. It's > just that 4 bytes of UTF-8 happens to be large enough to represent > anything UTF-16 can represent with sur

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall

Dan Sugalski writes: : Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, : but that's in the Unicode 3.0 standard. Doesn't really matter where they install the artificial cap, because for philosophical reasons Perl is gonna support larger values anyway. It's just that 4

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

NeonEdge <[EMAIL PROTECTED]> writes: > This is evident in the "Musical Symbols" and even "Byzantine Musical > Symbols". Are these character sets more important than the actual > language character sets being denied to the other countries? Are musical > and mathematical symbols even a language at

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread NeonEdge

The problem as I see it, is not that the mechanism can't handle the languages, it is that the Latin/Gothic countries chose first, and gave what's left to the Oriental countries. This is evident in the "Musical Symbols" and even "Byzantine Musical Symbols". Are these character sets more important

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Simon Cozens <[EMAIL PROTECTED]> writes: > On Tue, Jun 05, 2001 at 03:27:03PM -0700, Russ Allbery wrote: >> Caseless characters should be guaranteed unchanged by conversion to >> upper or lower case, IMO. > I think Bryan's asking more about \p{IsUpper} than uc(). Ahh... well, Unicode classifies

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 03:27:03PM -0700, Russ Allbery wrote: > Caseless characters should be guaranteed unchanged by conversion to upper > or lower case, IMO. I think Bryan's asking more about \p{IsUpper} than uc(). -- Henry, I'm a Regent Master of the Ancient and Venerable House of Congregati

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Bryan C . Warnock

On Tuesday 05 June 2001 05:49 pm, Simon Cozens wrote: > YES. Definitely. Same Unicode character, same thing. You wanted something > else, use a different Unicode character. I don't understand. There *is* only one character. I can't choose another. Take 0x0648, for instance. It's both waw, th

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski

At 03:21 PM 6/5/2001 -0700, Russ Allbery wrote: >Dan Sugalski <[EMAIL PROTECTED]> writes: > > At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote: > > >> (As an aside, UTF-8 also is not an X-byte encoding; UTF-8 is a variable > >> byte encoding, with each character taking up anywhere from one to six >

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Bryan C Warnock <[EMAIL PROTECTED]> writes: > Some additional stuff to ponder over, and maybe Unicode addresses these > - I haven't been able to read *all* the Unicode stuff yet. (And, yes, > Simon, you will see me in class.) > Some languages don't have upper or lower case. Are tests and > tra

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Dan Sugalski <[EMAIL PROTECTED]> writes: > At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote: >> (As an aside, UTF-8 also is not an X-byte encoding; UTF-8 is a variable >> byte encoding, with each character taking up anywhere from one to six >> bytes in the encoded form depending on where in Unicode

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 05:39:36PM -0400, Bryan C . Warnock wrote: > Some languages don't have upper or lower case. Are tests and translations > on caseless characters true or false? (Or undefined?) I'd say undefined. > Should the same Unicode character, when used in two different languages

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Bryan C . Warnock

On Tuesday 05 June 2001 03:24 pm, Dan Sugalski wrote: > > > The second objection is again related to character versus glyph > > > issues: since Chinese, > > > >I think this problem =~ locale. For any unicode character, you can not > >properly tell its lower case or upper case without considering

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Graham Barr

On Tue, Jun 05, 2001 at 03:31:24PM -0500, David L. Nicol wrote: > Graham Barr wrote: > > > I think there are a lot of benefits to the re engine not to be > > separate from the core perl ops. > > > So does it start with a split(//,$bound_thing) or does it use > substr(...) with explicit offsets?

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread David L. Nicol

Graham Barr wrote: > I think there are a lot of benefits to the re engine not to be > separate from the core perl ops. So does it start with a split(//,$bound_thing) or does it use substr(...) with explicit offsets?

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski

At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote: >Bart Lateur <[EMAIL PROTECTED]> writes: > > UTF-8 is NOT limited to 16 bits (3 bytes). > >That's an odd definition of byte you have there. :) Maybe it's RAD50. :) Still, it may take 3 bytes to represent in UTF-8 a character that takes 2 bytes in

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Bart Lateur <[EMAIL PROTECTED]> writes: > On 05 Jun 2001 11:07:11 -0700, Russ Allbery wrote: >> Particularly since part of his contention is that 16 bits isn't enough, >> and I think all the widely used national character sets are no more >> than 16 bits, aren't they? > It's not really important

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski

At 11:18 AM 6/5/2001 -0700, Hong Zhang wrote: > > Firstly, the JIS standard defines, along with the ordering and > > enumeration of its characters, their glyph shape. Unicode, on the other > > hand does not. This means that as far as Unicode is concerned, there is > > literally no distinction be

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 09:16:05PM +0200, Bart Lateur wrote: > Unicode "text" files No such animal. Unicode's a character repertoire, not an encoding. See you at my Unicode tutorial at TPC? :) -- buf[hdr[0]] = 0;/* unbelievably lazy ken (twit) */ - Andrew Hume

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Bart Lateur

On 05 Jun 2001 11:07:11 -0700, Russ Allbery wrote: >Particularly since part of his contention is that 16 bits isn't enough, >and I think all the widely used national character sets are no more than >16 bits, aren't they? It's not really important. UTF-8 is NOT limited to 16 bits (3 bytes). With

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang

> Firstly, the JIS standard defines, along with the ordering and > enumeration of its characters, their glyph shape. Unicode, on the other > hand does not. This means that as far as Unicode is concerned, there is > literally no distinction between two distinct shapes and hence no way to > specify

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Graham Barr

On Mon, Jun 04, 2001 at 06:04:10PM -0700, Larry Wall wrote: > Well, other languages have explored that option, and I think that makes > for an unnatural interface. If you think of regexes as part of a > larger language, you really want them to be as incestuous as possible, > just as any other par

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Dan Sugalski <[EMAIL PROTECTED]> writes: > It does bring up a deeper issue, however. Unicode is, at the moment, > apparently inadequate to represent at least some part of the asian > languages. Are the encodings currently in use less inadequate? I've been > assuming that an Anything->Unicode tran

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang

> Courtesy of Slashdot, > http://www.hastingsresearch.com/net/04-unicode-limitations.shtml > > I'm not sure if this is an issue for us or not, as we're generally > language-neutral, and I don't see any technical issues with any of the > UTF-* encodings having headroom problems. I think the au

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 01:31:38PM -0400, Dan Sugalski wrote: > The other issue it actively brought up was the complaint about having to > share glyphs amongst several languages, which didn't strike me as all that > big a deal either, except perhaps as a matter of national pride and/or easy > id

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Hong Zhang

> On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote: > > This is the bit that scares me about unifying perl ops and regex ops: > > can we really unify them without taking a performance hit? > > Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like. > > Second, Rub

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski

At 06:22 PM 6/5/2001 +0100, Simon Cozens wrote: >On Tue, Jun 05, 2001 at 10:17:08AM -0700, Russ Allbery wrote: > > Is it just me, or does this entire article reduce not to "Unicode doesn't > > work" but "Unicode should assign more characters"? > >Yes. And Unicode has assigned more characters; it's

Re: PDD 2nd go: Conventions and Guidelines for Perl Source Code

2001-06-05 Thread Bart Lateur

On Tue, 29 May 2001 18:25:45 +0100 (BST), Dave Mitchell wrote: >diffs: > >-"K&R" style for indenting control constructs >+"K&R" style for indenting control constructs: ie the closing C<}> should >+line up with the opening C etc. On Wed, 30 May 2001 10:37:06 -0400, Dan Sugalski wrote: >I realize

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 10:17:08AM -0700, Russ Allbery wrote: > Is it just me, or does this entire article reduce not to "Unicode doesn't > work" but "Unicode should assign more characters"? Yes. And Unicode has assigned more characters; it's factually challenged. -- And it should be the law: I

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Russ Allbery

Dan Sugalski <[EMAIL PROTECTED]> writes: > Courtesy of Slashdot, > http://www.hastingsresearch.com/net/04-unicode-limitations.shtml Is it just me, or does this entire article reduce not to "Unicode doesn't work" but "Unicode should assign more characters"? The presentation initially made me thi

Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski

Courtesy of Slashdot, http://www.hastingsresearch.com/net/04-unicode-limitations.shtml I'm not sure if this is an issue for us or not, as we're generally language-neutral, and I don't see any technical issues with any of the UTF-* encodings having headroom problems. It does argue for abstract

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Storrs

On Tue, 5 Jun 2001, Dave Mitchell wrote: > dispatch loop. I'd much rather have a 'regex start' opcode which > calls a separate dispath loop function, and which then interprets any > further ops in the bytestream as regex ops. That way we double the number > of 8-bit ops, and can have all the re

Re: PDD 2nd go: Conventions and Guidelines for Perl Source Code

2001-06-05 Thread Dave Storrs

On Tue, 5 Jun 2001, Hugo wrote: > I'd also like to see a specification for indentation when breaking long > lines. Fwiw, the style that I prefer is: someFunc( really_long_param_1, (long_parm2 || parm3), really_long_other_param

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Mitchell

Simon Cozens <[EMAIL PROTECTED]> opined: > On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote: > > This is the bit that scares me about unifying perl ops and regex ops: > > can we really unify them without taking a performance hit? > > Coupl'a things: firstly, we can make Perl 6 ops as

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Simon Cozens

On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote: > This is the bit that scares me about unifying perl ops and regex ops: > can we really unify them without taking a performance hit? Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like. Second, Ruby uses a giant

Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Dave Mitchell

Larry Wall <[EMAIL PROTECTED]> wrote: > It may certainly be valuable to (not) think of it that way, but just > don't be surprised if the regex folks come along and borrow a lot of > your opcodes to make things that look like (in C): > >while (s < send && isdigit(*s)) s++; This is the bit th

43 matches

Mail list logo