John Perks and Sarah Mount wrote: > I have to split some identifiers that are casedLikeThis into their > component words. In this instance I can safely use [A-Z] to represent > uppercase, but what pattern should I use if I wanted it to work more > generally? I can envisage walking the string testing the > unicodedata.category of each char, but is there a regex'y way to denote > "uppercase"?
In this form, it is currently not implemented, although it should be (written as [[:upper:]], I believe); contributions are welcome (make sure you read the Unicode consortium's guidelines on regular expressions before attempting to implement it). Until then, the "best" way is to use a regular character class, precomputed or computed at runtime. uni_upper = [unichr(i) for i in range(sys.maxunicode) if unichr(i).isupper()] uni_re = u"["+u"".join(uni_upper)+u"]" On my machine, this takes approximately one second to compute, which may or may not be too much as a startup cost. To speed this up, you could dump the resulting uni_re into a Python source file. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list