On Jan 25, 5:59 am, Scott David Daniels <scott.dani...@acm.org> wrote: > Sean Brown wrote: > > I have the following string ...: "td[ct] = [[ ... ]];\r\n" > > The ... (representing text in the string) is what I'm extracting .... > > So I think the regex \[\[(.*)\]\]; should do it. > > The problem is it appears that python is escaping the \ in the regex > > because I see this: > >>>> reg = '\[\[(.*)\]\];' > >>>> reg > > '\\[\\[(.*)\\]\\];' > > Now to me looks like it would match the string - \[\[ ... \]\]; > > ... > > OK, you already have a good answer as to what is happening. > I'll mention that raw strings were put in the language exactly for > regex work. They are useful for any time you need to use the backslash > character (\) within a string (but not as the final character). > For example: > len(r'\a\b\c\d\e\f\g\h') == 16 and len('\a\b\c\d\e\f\g\h') == 13 > > If you get in the habit of typing regex strings as r'...' or r"...", > and examining the patters with print(somestring), you'll ease your life.
All excellent suggestions, but I'm surprised that nobody has mentioned the re.VERBOSE format. Manual sez: ''' re.X re.VERBOSE This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored. That means that the two following regular expression objects that match a decimal number are functionally equal: a = re.compile(r"""\d + # the integral part \. # the decimal point \d * # some fractional digits""", re.X) b = re.compile(r"\d+\.\d*") ''' My comments: (1)"looks nicer" is not the point; it's understandability (2) if you need a space, use a character class ->[ ]<- not an unescaped backslash ->\ <- (3) the indentation in the manual doesn't fit my idea of "looks nicer"; I'd do a = re.compile(r""" \d + # the integral part \. # the decimal point \d * # some fractional digits """, re.X) (4) you can aid understandability by more indentation especially when you have multiple capturing expressions and (?......) gizmoids e.g. r""" ( ..... # prefix ) ( (?......) # look-back assertion (?....) # etc etc ) """ Worth a try if you find yourself going nuts getting the parentheses matching. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list