On May 6, 1:52 am, Wiseman <[EMAIL PROTECTED]> wrote: > On May 5, 5:12 am, "Terry Reedy" <[EMAIL PROTECTED]> wrote: > > > I believe the current Python re module was written to replace the Python > > wrapping of pcre in order to support unicode. > > I don't know how PCRE was back then, but right now it supports UTF-8 > Unicode patterns and strings, and Unicode character properties. Maybe > it could be reintroduced into Python?
"UTF-8 Unicode" is meaningless. Python has internal unicode string objects, with comprehensive support for converting to/from str (8-bit) string objects. The re module supports unicode patterns and strings. PCRE "supports" patterns and strings which are encoded in UTF-8. This is quite different, a kludge, incomparable. Operations which inspect/ modify UTF-8-encoded data are of interest only to folk who are constrained to use a language which has nothing resembling a proper unicode datatype. > > At least today, PCRE supports recursion and recursion check, > possessive quantifiers and once-only subpatterns (disables > backtracking in a subpattern), callouts (user functions to call at > given points), and other interesting, powerful features. The more features are put into a regular expression module, the more difficult it is to maintain and the more the patterns look like line noise. There's also the YAGNI factor; most folk would restrict using regular expressions to simple grep-like functionality and data validation -- e.g. re.match("[A-Z][A-Z]?[0-9]{6}[0-9A]$", idno). The few who want to recognise yet another little language tend to reach for parsers, using regular expressions only in the lexing phase. If you really want to have PCRE functionality in Python, you have a few options: (1) create a wrapper for PCRE using e.g. SWIG or pyrex or hand- crafting (2) write a PEP, get it agreed, and add the functionality to the re module (3) wait until someone does (1) or (2) for free (4) fund someone to do (1) or (2) HTH, John -- http://mail.python.org/mailman/listinfo/python-list