[issue12734] Request for property support in Python re lib

Tom Christiansen Thu, 11 Aug 2011 13:14:26 -0700

New submission from Tom Christiansen <tchr...@perl.com>:

Python supports no Unicode properties in its re library, making it unsuitable 
for work with Unicode. This is therefore a formal request for the Python re 
library to support Unicode properties.


 The eleven properties required by Unicode Technical Report #18's RL1.2 are the 
bare minimum which must be added to make it possible to use Python reguyar 
expressions on Unicode. 

The proposed RL2.7 on Full Properties is even better.  That is found at

  http://unicode.org/reports/tr18/proposed.html#Full_Properties

Although by the time you read this, it will have been made an official part of 
tr18.

Matthew Barnett's replacement library for re, called regex, support 67 Unicode 
properties at last count, including the strongly recommended loose matching.  

The standard re library needs to be spiffed up to make it suitable for Unicode 
processing; it is not currently usable for that due to this missing 
functionality.  I quote from the Level 1 conformance requirement of tr18:
    
    "Level 1: This is a minimal level for useful Unicode support. It does not 
account for end-user expectations for character support, but does satisfy most 
low-level programmer requirements. The results of regular expression matching 
at this level are independent of country or language. At this level, the user 
of the regular expression engine would need to write more complicated regular 
expressions to do full Unicode processing."

pass RL1.1 Hex Notation
fail RL1.2 Properties
fail RL1.2a Compatibility Properties
fail RL1.3 Subtraction and Intersection
fail RL1.4 Simple Word Boundaries
fail RL1.5 Simple Loose Matches
fail RL1.6 Line Boundaries
fail RL1.7 Supplementary Code Points

(withdrawn) RL2.1 Canonical Equivalents
fail RL2.2 Extended Grapheme Clusters
fail RL2.3 Default Word Boundaries
fail RL2.4 Default Case Conversion
pass RL2.5 Name Properties
fail RL2.6 Wildcards in Property Values
fail RL2.7 Full Properties

I won’t even talk about Level 3.  

ICU, Perl, and Java7 all meet Level One conformance requirements with several 
Level 2 requirements also met.  It is important for Python to meet the Unicode 
Standard in this so that people can use Python for regex matching Unicode text. 
 They currently cannot usefully do so per the requirements of tr18.

----------
components: Regular Expressions
messages: 141925
nosy: tchrist
priority: normal
severity: normal
status: open
title: Request for property support in Python re lib
type: feature request
versions: Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12734>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12734] Request for property support in Python re lib

Reply via email to