On May 18, 2009, at 21:54 , Larry Wall wrote:
On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote:
No, a few million code points in the Unicode standard can produce an
arbitrary number of unique grapheme clusters, since you can apply as
many modifiers as you like to each different ba
On Mon, May 18, 2009 at 07:59:31PM -0500, John M. Dlugosz wrote:
> No, a few million code points in the Unicode standard can produce an
> arbitrary number of unique grapheme clusters, since you can apply as
> many modifiers as you like to each different base character. If you
> allow multipl
Larry Wall larry-at-wall.org |Perl 6| wrote:
into *uint16 as long as they don't synthesize codepoints. And we can
always resort to *uint32 and *int32 knowing that the Unicode consortium
isn't going to use the top bit any time in the foreseeable future.
(Unless, of course, they endorse something
Larry Wall larry-at-wall.org |Perl 6| wrote:
Sure, but this is a weak argument, since you can already write complete
ord/chr nonsense at the codepoint level (even in ASCII), and all we're
doing here is making graphemes work more like codepoints in terms of
storage and indexing. If people abuse i
Mark J. Reed markjreed-at-gmail.com |Perl 6| wrote:
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
wrote:
If you haven't read the PDD, it's a good start.
I get all that, really. I still question the necessity of mapping
each grapheme to a single integer. A single *value*, sure.
Author: moritz
Date: 2009-05-18 23:08:54 +0200 (Mon, 18 May 2009)
New Revision: 26876
Modified:
docs/Perl6/Spec/S02-bits.pod
docs/Perl6/Spec/S09-data.pod
Log:
[S02] get rid of the each() comprehension
[S09] document speculative each() junction with grep semantics
Modified: docs/Perl6/Spec/S
Larry Wall wrote:
Which is a very interesting topic, with connections to type theory,
scope/domain management, and security issues (such as the possibility
of a DoS attack on the translation tables).
I think that a DoS attack on Unicode would be called "IBM/Windows Code
Pages." The rest of
Brandon S. Allbery KF8NH wrote:
On May 18, 2009, at 14:16 , Larry Wall wrote:
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
3) Details of 'life-time', round-trip.
Which is a very interesting topic, with connections to type theory,
scope/domain management, and security
On May 18, 2009, at 14:16 , Larry Wall wrote:
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
3) Details of 'life-time', round-trip.
Which is a very interesting topic, with connections to type theory,
scope/domain management, and security issues (such as the possibility
On Sun, May 17, 2009 at 07:41:45PM +0200, Moritz Lenz wrote:
: Hi,
:
: (sorry for yet another p6l email mentioning junctions; if they annoy you
: just ignore this mail :-)
:
: while reviewing some tests I found the "each() comprehension" in S02
: that evaded my attention so far.
:
: Do we really
On Mon, May 18, 2009 at 02:16:17PM -0400, Mark J. Reed wrote:
: Surrogates are just weird, since they have assigned code points even
: though they're purely an encoding mechanism. As such, they straddle
: the line between abstract characters and an encoding form. I assume
: that if text comes in a
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
> [1] Open questions:
>
> 1) Will graphemes have an unique charname?
>e.g. GRAPHEME LATIN SMALL LETTER A WITH DOT BELOW AND DOT ABOVE
Yes, presumably that comes with the "normalization" part of NFG.
We're not aiming for rou
> On Mon, May 18, 2009 at 12:37:49PM -0400, Brandon S. Allbery KF8NH wrote:
>> I would argue that if you are working with a grapheme cluster
>> ("grapheme"), arithmetic on individual grapheme values is undefined.
Yup, that was exactly what I was arguing.
>> In short, I think the only remotely san
On Mon, May 18, 2009 at 12:37:49PM -0400, Brandon S. Allbery KF8NH wrote:
> On May 18, 2009, at 09:21 , Mark J. Reed wrote:
>> If you're doing arithmetic with the code points or scalar values of
>> characters, then the specific numbers would seem to matter. I'm
>
>
> I would argue that if you are
On May 18, 2009, at 09:21 , Mark J. Reed wrote:
If you're doing arithmetic with the code points or scalar values of
characters, then the specific numbers would seem to matter. I'm
I would argue that if you are working with a grapheme cluster
("grapheme"), arithmetic on individual grapheme v
On Sun, May 17, 2009 at 09:35:50PM +0200, Moritz Lenz wrote:
: Hi,
:
: t/oo/value_types.t mentions the "is value" trait, which doesn't appear
: in the spec anywhere. According to the discussion in [1] there was
: speculation about 'is cow' and 'is value', but the former didn't seem to
: enter the
On Mon, May 18, 2009 at 07:01:27AM +0200, pugs-comm...@feather.perl6.nl wrote:
: Author: jdlugosz
: Date: 2009-05-18 07:01:27 +0200 (Mon, 18 May 2009)
: New Revision: 26868
:
: Modified:
:docs/Perl6/Spec/S03-operators.pod
: Log:
: Fix one typo, s/know/known/. Really just low-hanging fruit to
Mark J. Reed wrote:
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
wrote:
If you haven't read the PDD, it's a good start.
I get all that, really. I still question the necessity of mapping
each grapheme to a single integer. A single *value*, sure.
length($weird_grapheme) should a
On Mon, May 18, 2009 at 9:11 AM, Austin Hastings
wrote:
> If you haven't read the PDD, it's a good start.
I get all that, really. I still question the necessity of mapping
each grapheme to a single integer. A single *value*, sure.
length($weird_grapheme) should always be 1, absolutely. But w
If you haven't read the PDD, it's a good start.
To summarize, probably oversimplifying badly:
1. A grapheme is a character *as seen on the page.* That is, if
composing "a" + "dot above" + "dot below" produces an a with dots above
and below it, then THAT is the grapheme.
2. Unicode has a lot
Do we really need to be able to map arbitrary graphemes to integers,
or is it enough to have an opaque value returned by ord() that, when
fed to chr(), returns the same grapheme? If the latter, a list of
code points (in one of the official Normalzation Formats) would seem
to be sufficient.
On 5/1
Darren Duncan wrote:
Since you seem eager, I recommend you start with porting the Parrot PDD
28 to a new Perl 6 Synopsis 15, and continue from there.
IMHO we need some people for a broad discussion on the details first.
Helmut Wollmersdorfer
John M. Dlugosz wrote:
I was going over S02, and found it opens with, "By default Perl presents
Unicode in "NFG" formation, where each grapheme counts as one character."
I looked up NFG, and found it to be an invention of this group, but
didn't find any details when I tried to chase down the l
23 matches
Mail list logo