John, > (1) raw string for improved legibility > ru'(?u)\b([á-ñ]{2,}\s+)([<<"][Á-Ñá-ñ]+)(\s*-?[Á-Ñá-ñ]+)*([>>"])'
This actually escaped my notice after I had posted -- the letters with diacritics are incorrectly decoded Cyrillic letters -- I suppose I code use the Unicode escape sequences (the sets [á-ñ] and [Á-Ñá-ñ] are the Cyrillic equivalents of [a-z] and [A-Za-z]) but then suddenly the legibility goes out the window again. > (3) what appears between [] is a set of characters, so [<<"] is the > same as [<"] and probably isn't doing what you expect; have you tested > this regex for correctness? These were angled quotation marks in the original Unicode. Sorry again. The regex matches everything it is supposed to. The extra parentheses were because I had somehow missed the .group method and it had only been returning what was only in the one needed set of parentheses. > I can't imagine how "not a programmer" implies "interested to know if > there is a more elegant way". More carefully stated: "I am self-taught have no real training or experience as a programmer and would be interested in seeing how a programmer with training and experience would go about this." Thank you, Jonathan -- http://mail.python.org/mailman/listinfo/python-list