On Thu, May 5, 2016, at 03:36, Steven D'Aprano wrote: > Putting non-ASCII letters aside for the moment, how would you match these > specs as a regular expression?
Well, obviously *your* language (not the OP's), given the cases you reject, is "one or more sequences of letters separated by space*-ampersand-space*", and that is actually one of the easiest kinds of regex to write: "[A-Z]+( *& *[A-Z]+)*". However, your spec is wrong: > - Leading or trailing spaces, or spaces not surrounding an ampersand, > must not match: "AAA BBB" must be rejected. The *very first* item in OP's list of good outputs is 'PHYSICAL FITNESS CONSULTANTS & TRAINERS'. If you want something that's extremely conservative (except for the *very odd in context* choice of allowing arbitrary numbers of spaces - why would you allow this but reject leading or trailing space?) and accepts all of OP's input: [A-Z]+(( *& *| +)[A-Z]+)* -- https://mail.python.org/mailman/listinfo/python-list