from:"Helmut Wollmersdorfer"

Re: UCA and NFC/NFD issues in pattern matching

2011-03-06 Thread Helmut Wollmersdorfer

thing that plagues us with full Unicode case-folding. This is the "\N{LATIN SMALL LIGATURE FFI}" =~ /(f)(f)/i problem, amongst others. Seems that you are going to get into the same dilemma if you allow matching partial graphemes in grapheme mode. We can dream of :ignoreorthography or :ignoretypography, but they should not be implemented into a regex-engine. Helmut Wollmersdorfer

Re: Perl6 regexes and UTS#18

2011-02-09 Thread Helmut Wollmersdorfer

Larry Wall wrote: On Sun, Feb 06, 2011 at 08:59:51PM +0100, Helmut Wollmersdorfer wrote: : Tom Christiansen wrote: : > I'm also curious whether there are active plans to address the : > tr18 requirements in perl6 regexes. It would be a wonderful : > feather in perl6'

Re: Perl6 regexes and UTS#18

2011-02-06 Thread Helmut Wollmersdorfer

ppropriate chapters of the Unicode standard in the specification of Perl6. This would make Unicode test-cases reusable. And an implementation should always declare, which features of Unicode are implemented (and which not) in which version of Unicode. Helmut Wollmersdorfer

Re: Perl6 and "accents"

2010-05-18 Thread Helmut Wollmersdorfer

ical equivalence, both of which really require locale knowledge outside the charset itself. Sure. The specs of Perl 6 still need huge work on the Unicode part. Helmut Wollmersdorfer

Re: Perl6 and "accents"

2010-05-18 Thread Helmut Wollmersdorfer

t in the definition. And if a Unicode term is used it should exactly mean what is specified in the Unicode standard. E.g. it would be a fault, if graphemes are defined by '\pX' or '(?>\PM\pM*)', as Unicode provides the properties 'Grapheme_Base' and 'Grapheme_Extend' (unfortunately they are not supported by Perl 5 or Perl 6). Helmut Wollmersdorfer

S05 Regex - Unicode properties

2009-11-23 Thread Helmut Wollmersdorfer

ly a bug in 'unicore'). 2) Syntax of non-boolean properties: In Perl 5 e.g. \p{BidiClass:L} # Left-to-Right \p{gc:L}# General category = Letter should be in Perl 6 (thx Moritz' suggestion on #perl6): Helmut Wollmersdorfer

Re: Does a string remember all Unicode levels?

2009-08-12 Thread Helmut Wollmersdorfer

file, filters the lines, and writes them back, if the result is in another normalization form. Helmut Wollmersdorfer

Re: "Unicode in 'NFG' formation" ?

2009-05-20 Thread Helmut Wollmersdorfer

Larry Wall wrote: On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote: 2) Can I use Unicode property matching safely with graphemes? If yes, who or what maintains the necessary tables? Good question. My assumption is that adding marks to a character doesn't chang

Re: "Unicode in 'NFG' formation" ?

2009-05-18 Thread Helmut Wollmersdorfer

Darren Duncan wrote: Since you seem eager, I recommend you start with porting the Parrot PDD 28 to a new Perl 6 Synopsis 15, and continue from there. IMHO we need some people for a broad discussion on the details first. Helmut Wollmersdorfer

Re: "Unicode in 'NFG' formation" ?

2009-05-18 Thread Helmut Wollmersdorfer

ould the definition of graphemes conform to Unicode Standard Annex #29 'grapheme clusters'? Wich level - legacy, extended or tailored? Helmut Wollmersdorfer

CharLingua and Unicode locales

2009-05-04 Thread Helmut Wollmersdorfer

AFAIR in two Specs 'CharLingua' appears as - maybe - a leftover from the history of Perl 6. Whatever the idea of 'CharLingua' was, something nice-to-have would be support of locale-dependent processing in the sense of Unicode http://cldr.unicode.org/ Helmut Wollmersdorfer

S02 Names - Alphabetic?

2009-05-04 Thread Helmut Wollmersdorfer

tinue=No) rakudo: FAIL, std: FAIL Wouldn't it be easier to reference the Unicode properties 1) ID_Start plus U+005F LOW LINE (=Underscore) 2) ID_Continue for identifiers? That's what Unicode 'ID_x' is for. With the nice 'side effect' that combining diacritics are in ID_Continue. Helmut Wollmersdorfer

.to_charnames() and .from_charnames()

2009-04-27 Thread Helmut Wollmersdorfer

-time of the process, but these names would need to be checked for uniqueness (performance problem). Helmut Wollmersdorfer

Whitespace in \c[...], \x[...], etc.

2009-04-27 Thread Helmut Wollmersdorfer

asel LATIN SMALL LETTER A, # some comment COMBINING DOT BELOW, # thisandthat ]" Helmut Wollmersdorfer

Interpolation of "\c[$charname]"?

2009-04-27 Thread Helmut Wollmersdorfer

It's not explicitly specified, if a something like my $charname = 'SPACE'; my $string = "\c[$charname]"; should interpolate or not. I assume 'not'. Right? Helmut Wollmersdorfer

Re: Unicode bracketing spec question

2009-04-23 Thread Helmut Wollmersdorfer

SITION BRACKET Cool idea. But if you really want to use these characters, your source will be hard to read without exotic fonts. You have been warned;-) Helmut Wollmersdorfer

Re: UCA and NFC/NFD issues in pattern matching

Re: Perl6 regexes and UTS#18

Re: Perl6 regexes and UTS#18

Re: Perl6 and "accents"

Re: Perl6 and "accents"

S05 Regex - Unicode properties

Re: Does a string remember all Unicode levels?

Re: "Unicode in 'NFG' formation" ?

Re: "Unicode in 'NFG' formation" ?

Re: "Unicode in 'NFG' formation" ?

CharLingua and Unicode locales

S02 Names - Alphabetic?

.to_charnames() and .from_charnames()

Whitespace in \c[...], \x[...], etc.

Interpolation of "\c[$charname]"?

Re: Unicode bracketing spec question

16 matches

Site Navigation

Mail list logo

Footer information