thing that plagues us with full Unicode case-folding. This is
the
"\N{LATIN SMALL LIGATURE FFI}" =~ /(f)(f)/i
problem, amongst others. Seems that you are going to get into the
same dilemma if you allow matching partial graphemes in grapheme mode.
We can dream of :ignoreorthography or :ignoretypography, but they should
not be implemented into a regex-engine.
Helmut Wollmersdorfer
Larry Wall wrote:
On Sun, Feb 06, 2011 at 08:59:51PM +0100, Helmut Wollmersdorfer wrote:
: Tom Christiansen wrote:
: > I'm also curious whether there are active plans to address the
: > tr18 requirements in perl6 regexes. It would be a wonderful
: > feather in perl6'
ppropriate chapters of the Unicode
standard in the specification of Perl6. This would make Unicode
test-cases reusable. And an implementation should always declare, which
features of Unicode are implemented (and which not) in which version of
Unicode.
Helmut Wollmersdorfer
ical equivalence, both of which really
require locale knowledge outside the charset itself.
Sure. The specs of Perl 6 still need huge work on the Unicode part.
Helmut Wollmersdorfer
t
in the definition. And if a Unicode term is used it should exactly mean
what is specified in the Unicode standard. E.g. it would be a fault, if
graphemes are defined by '\pX' or '(?>\PM\pM*)', as Unicode provides the
properties 'Grapheme_Base' and 'Grapheme_Extend' (unfortunately they are
not supported by Perl 5 or Perl 6).
Helmut Wollmersdorfer
ly a bug in 'unicore').
2) Syntax of non-boolean properties:
In Perl 5 e.g.
\p{BidiClass:L} # Left-to-Right
\p{gc:L}# General category = Letter
should be in Perl 6 (thx Moritz' suggestion on #perl6):
Helmut Wollmersdorfer
file,
filters the lines, and writes them back, if the result is in another
normalization form.
Helmut Wollmersdorfer
Larry Wall wrote:
On Mon, May 18, 2009 at 11:11:32AM +0200, Helmut Wollmersdorfer wrote:
2) Can I use Unicode property matching safely with graphemes?
If yes, who or what maintains the necessary tables?
Good question. My assumption is that adding marks to a character
doesn't chang
Darren Duncan wrote:
Since you seem eager, I recommend you start with porting the Parrot PDD
28 to a new Perl 6 Synopsis 15, and continue from there.
IMHO we need some people for a broad discussion on the details first.
Helmut Wollmersdorfer
ould the definition of graphemes conform to Unicode Standard Annex
#29 'grapheme clusters'? Wich level - legacy, extended or tailored?
Helmut Wollmersdorfer
AFAIR in two Specs 'CharLingua' appears as - maybe - a leftover from the
history of Perl 6.
Whatever the idea of 'CharLingua' was, something nice-to-have would be
support of locale-dependent processing in the sense of Unicode
http://cldr.unicode.org/
Helmut Wollmersdorfer
tinue=No)
rakudo: FAIL, std: FAIL
Wouldn't it be easier to reference the Unicode properties
1) ID_Start plus U+005F LOW LINE (=Underscore)
2) ID_Continue
for identifiers? That's what Unicode 'ID_x' is for.
With the nice 'side effect' that combining diacritics are in ID_Continue.
Helmut Wollmersdorfer
-time of
the process, but these names would need to be checked for uniqueness
(performance problem).
Helmut Wollmersdorfer
asel
LATIN SMALL LETTER A, # some comment
COMBINING DOT BELOW, # thisandthat
]"
Helmut Wollmersdorfer
It's not explicitly specified, if a something like
my $charname = 'SPACE';
my $string = "\c[$charname]";
should interpolate or not.
I assume 'not'. Right?
Helmut Wollmersdorfer
SITION BRACKET
Cool idea.
But if you really want to use these characters, your source will be hard
to read without exotic fonts. You have been warned;-)
Helmut Wollmersdorfer
16 matches
Mail list logo