Oh, a further thought...
On Thursday 05 May 2016 16:46, Stephen Hansen wrote: > On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote: >> Start by writing a function or a regex that will distinguish strings that >> match your conditions from those that don't. A regex might be faster, but >> here's a function version. >> ... snip ... > > Yikes. I'm all for the idea that one shouldn't go to regex when Python's > powerful string type can answer the problem more clearly, but this seems > to go out of its way to do otherwise. > > I don't even care about faster: Its overly complicated. Sometimes a > regular expression really is the clearest way to solve a problem. Putting non-ASCII letters aside for the moment, how would you match these specs as a regular expression? - All uppercase ASCII letters (A to Z only), optionally separated into words by either a bare ampersand (e.g. "AAA&AAA") or an ampersand with leading and trailing spaces (spaces only, not arbitrary whitespace): "AAA & AAA". - The number of spaces on either side of the ampersands need not be the same: "AAA& BBB & CCC" should match. - Leading or trailing spaces, or spaces not surrounding an ampersand, must not match: "AAA BBB" must be rejected. - Leading or trailing ampersands must also be rejected. This includes the case where the string is nothing but ampersands. - Consecutive ampersands "AAA&&&BBB" and the empty string must be rejected. I get something like this: r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)" but it fails on strings like "AA & A & A". What am I doing wrong? For the record, here's my brief test suite: def test(pat): for s in ("", " ", "&" "A A", "A&", "&A", "A&&A", "A& &A"): assert re.match(pat, s) is None for s in ("A", "A & A", "AA&A", "AA & A & A"): assert re.match(pat, s) -- Steve -- https://mail.python.org/mailman/listinfo/python-list