New submission from Zack Weinberg: The `re` module documentation does not do a good job of explaining exactly what `\w` matches. Quoting https://docs.python.org/3.5/library/re.html :
> \w > For Unicode (str) patterns: > Matches Unicode word characters; this includes most characters > that can be part of a word in any language, as well as numbers > and the underscore. Empirically, this appears to mean "everything in Unicode general categories L* and N*, plus U+005F (underscore)". That is a perfectly sensible definition and the documentation should state it in those terms. "Unicode word characters" could mean any number of different things; note for instance that UTS#18 gives a very different definition. (Further reading: https://gist.github.com/zackw/3077f387591376c7bf67 plus links therefrom). ---------- assignee: docs@python components: Documentation messages: 255463 nosy: docs@python, zwol priority: normal severity: normal status: open title: Clarify exactly what \w matches in UNICODE mode versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25743> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com