For reference, here's how Perl 5.8 will define \p{IsFoo} character
classes:
# 005F: SPACING UNDERSCROE
['IsWord', '$cat =~ /^[LMN]/ or $code eq "005F"', ''],
['IsAlnum', '$cat =~ /^[LMN]/',''],
['IsAlpha', '$cat =~ /^[LM]/', ''],
# 0009: HORIZONTAL TABULATION
#
On Wednesday 13 June 2001 12:23 am, Jarkko Hietaniemi wrote:
> > RE Feature Override Create New
> >
> > switches 'i' only yes
> > anchorsno no
>
> (I would call them assertions.) Bzzt.
>
Another gig for Bean.
> >
> RE Feature Override Create New
>
> switches 'i' only yes
> anchorsno no
(I would call them assertions.) Bzzt.
> - Anchors. ^,$,\A,\Z,\z,\b, \G. Since the definition of a line (see 'm'
> and 's' above) isn't
On Tuesday 12 June 2001 10:58 pm, Bryan C. Warnock wrote:
> On Tuesday 12 June 2001 09:16 pm, Simon Cozens wrote:
> > On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
> > > We should let external collator to handle all these fancy features.
> >
> > Phew, I've been saying this all along.
> I think, following my line of thought, that [a-\N{KATAKANA LETTER KI}]
> should be equivalent to [\x{0061}-\x{30AD}], which would match any of
I think it should be an error. If you mean the code points write the
code points. Mixing symbolic names (KATAKANA LETTER KI) and native
characters (th
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
> > Perl came from ASCII-centric roots, so it's likely that most of our
> > biases are ASCII-centric. And for a couple of reasons, it's going to
> > be hard to deal with that:
> >
> > 1. Backwards compatability with existing Perl practice,
> >
> >
On Tuesday 12 June 2001 11:06 pm, Jarkko Hietaniemi wrote:
> > I. Make ranges work on Unicode code-points (if they don't already).
>
> U, yes, they do, if you by code-point ranges mean \x{...}-\x{...}
> but in general I would like to discourage the use of ranges. What do
> you think [a-\N{KAT
> Perl came from ASCII-centric roots, so it's likely that most of our
> biases are ASCII-centric. And for a couple of reasons, it's going to
> be hard to deal with that:
>
> 1. Backwards compatability with existing Perl practice,
>
> and
>
> 2. To do language-neutral right is -really- hard; lo
On Tuesday 12 June 2001 09:16 pm, Simon Cozens wrote:
> On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
> > We should let external collator to handle all these fancy features.
>
> Phew, I've been saying this all along. :)
I think we've *all* been saying that. We just need to determin
Dan Sugalski <[EMAIL PROTECTED]> writes:
>
> We probably also ought to answer the question "How accommodating to
> non-latin writing systems are we going to be?" It's an uncomfortable
> question, but one that needs asking. Answering by Larry, probably, but
> definitely asking. Perl's not real
We've pretty much run this subthread out of Perl content by now, so it
ought to stop here, and I should start exercising some of that
"restraint" thing. (Does it grow if you exercise it?)
So Damien, we can take it to private mail or to sci.lang.japan or something,
but if you promise to stop diggi
On Tue, Jun 12, 2001 at 06:45:31PM -0700, Damien Neil wrote:
> > Hrm, no, not usually; furigana are almost always hiragana, and
> > learner's textbooks - bah, they're not real Japanese. :)
>
> I believe you are confused;
*cough*. I believe I am not. But who am I? Let's ask Kenkyusha -
admittedly
On Wed, Jun 13, 2001 at 02:15:16AM +0100, Simon Cozens wrote:
> Or we could keep it out of core. It's up to you, really.
No, it isn't. It's up to Larry, or to whoever gets the regex
pumpkin.
I'm withdrawing from this discussion: My intent was to clarify
exactly why someone might want to treat K
On Tue, Jun 12, 2001 at 06:44:02PM -0400, Dan Sugalski wrote:
> We probably also ought to answer the question "How accommodating to
> non-latin writing systems are we going to be?"
What if Perl 6 simply reserved tags for extensions? This could assume
processing similar to Perl 5 for compatibility
On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
> We should let external collator to handle all these fancy features.
Phew, I've been saying this all along. :)
> Please note regex is O(n) at best, adding an external collator
> will make is O(2n).
While this is very true, I think con
On Tue, Jun 12, 2001 at 05:40:32PM -0700, Damien Neil wrote:
> The ability to match Hiragana as Katakana and vice-versa is almost
> identical conceptually to the ability to perform case insensitive
> matches on English text.
I am going to choose not to disagree with you on this, but...
> > What
On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
>
> We should let external collator to handle all these fancy features.
> People can always normalize/canonicalize/do-whatever-you-want
> and send the result text/binary to regex. All the features we
> argue about here can be easily done
On Wed, Jun 13, 2001 at 01:22:32AM +0100, Simon Cozens wrote:
> I'd say it was about as useful as providing a regexp option to translate
> the search term into French and try that instead.[1] Handy, possibly.
> Essential? No. Something that should be part of the core? I'll leave
> that for you to
We should let external collator to handle all these fancy features.
People can always normalize/canonicalize/do-whatever-you-want
and send the result text/binary to regex. All the features we
argue about here can be easily done by a customized collator.
Do NOT expect the Perl regex be a linguist
On Tue, Jun 12, 2001 at 05:03:17PM -0700, Damien Neil wrote:
> I can say that I feel that providing a mechanism for Hiragana
> characters to match Katakana and vice-versa is about as useful for a
> person doing Japanese text processing as case-insensitive matching is
> for a person working with En
On Tue, Jun 12, 2001 at 06:44:02PM -0400, Dan Sugalski wrote:
> While that's true, KATAKANA LETTER A and HIRAGANA LETTER A are also
> referring to distinct things. (Though arguably not as distinct as either
> with LATIN CAPITAL A) If we do one, why not the other? I'm perfectly happy
> with an a
At 03:12 PM 6/11/2001 -0700, Damien Neil wrote:
>On Mon, Jun 11, 2001 at 05:03:26PM -0400, Dan Sugalski wrote:
> > I don't think just /i should do that, as it seems rather extreme. (If you
> > took that argument, it would seem to follow that KATAKANA LETTER A matches
> > LATIN CAPITAL A, and I don
On Tue, Jun 12, 2001 at 06:12:35PM -0400, Dan Sugalski wrote:
> At the moment I'm leaning towards the functions doing their own decoding,
> as it seems likely to be faster. (Though we'd be duplicating the decoding
> logic everywhere, and bigger's reasonably bad) Possibly mandating shadow
> func
--- Dan Sugalski <[EMAIL PROTECTED]> wrote:
> 'Kay, here's a question to ponder. Should the op dispatch
> loop handle
> argument decoding, or should that be left to the opcode
> functions?
[good analysis of trade-off's snipped]
> At the moment I'm leaning towards the functions doing
> their own
'Kay, here's a question to ponder. Should the op dispatch loop handle
argument decoding, or should that be left to the opcode functions?
The upside to the functions handling the decoding is they can special-case
it. makeref (a hypothetical "make a reference to a PMC" operator), for
example, wo
25 matches
Mail list logo