On Fri, 6 May 2016 03:49 am, Jussi Piitulainen wrote: > Steven D'Aprano writes: > >> I get something like this: >> >> r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)" >> >> >> but it fails on strings like "AA & A & A". What am I doing wrong? > > It cannot split the string as (LETTERS & LETTERS)(LETTERS & LETTERS) > when the middle part is just one LETTER. That's something of a > misanalysis anyway. I notice that the correct pattern has already been > posted at least thrice and you have acknowledged one of them.
Thrice? I've seen Peter's response (he made the trivial and obvious simplification of just using A instead of [A-Z], but that was easy to understand), and Random832 almost got it, missing only that you need to match the entire string, not just a substring. If there was a third response, I missed it. > But I think you are also trying to do too much with a single regex. A > more promising start is to think of the whole string as "parts" joined > with "glue", then split with a glue pattern and test the parts: > > import re > glue = re.compile(" *& *| +") > keep, drop = [], [] > for datum in data: > items = glue.split(datum) > if all(map(str.isupper, items)): > keep.append(datum) > else: > drop.append(datum) Ah, the penny drops! For a while I thought you were suggesting using this to assemble a regex, and it just wasn't making sense to me. Then I realised you were using this as a matcher: feed in the list of strings, and it splits it into strings to keep and strings to discard. Nicely done, that is a good technique to remember. Thanks for the analysis! -- Steven -- https://mail.python.org/mailman/listinfo/python-list