On Fri, Dec 16, 2005 at 01:29:11PM +0100, Ruud H.G. van Tol wrote:
: John Macdonald:
: 
: > [trans]
: > If a shorter rule is allowed to match first, then the longer
: > rule can be removed from the match set, at least for constant
: > string matches.
: 
: It is not about the length of the rules, but about the length of the
: matches.

They're the same thing when only fixed strings are allowed.  We've only
been calling the "rules" for lack of a better term, but they aren't
rules, or even regexes.  Transliteration is "literal".

: If both \s+ and \h+ match the same length, should then \h+ be honored
: because it is more specific?

If you want ordered matches of real rules, you should use /<@array>/
in a real match.  The pattern equivalent to tr/// involves /<%hash>/
instead somehow.

: And are we only talking about matches at the same position? (Stepping
: through the input-buffer character-by-character, and testing each
: pattern.)

Both s/// and tr/// have ways of skipping non-matching characters, but
they're not the same.

It would be a useful exercise to write tr/// in terms of s///.
It occurs to me that it'd be awfully useful to have a kind of hash
that returns any unmatched key unchanged.  But there's actually a
subtle conflict between how you want to use the hash on the left
and on the right.  They have the same keys but different values.

    s/(<%match>)/%replace{$0}/

The way we've got hashes defined currently on the left, the lookup finds
an additional rule to continue parsing, on the assumption that the key
of %match is merely the first keyword of some longer construct.  But the
value can't simultaneously be a subsequent rule *and* a replacement value,
so we end up looking up the same string twice in two different hashes.
(Even if the first one is actually doing a trie internally or some such,
it's still effectively a hash lookup, and why do it twice?.)  Maybe there's
some way to write the rule in the value of %match that matches zero
width but returns a useful value so it'd be something more like:

    s/(<%match>)/$0[0]/

: > If, for example, '=' can match without
: > preferring to try first for '==' then you'll never match '=='
: > without syntactic help to force a backtracking retry.
: 
: If rules will match in order of appearance, it is to the user to put the
: rules in the right order.
: 
: Some help can be provided, like a warning when an 'ab' precedes an
: 'abc', and maybe even when an 'a*' precedes an 'a+'.

Yes, that would be useful for /<@array>/ analysis.  But tr/// ain't that.

Larry

Reply via email to