[EMAIL PROTECTED] wrote: > On Apr 5, 4:22 pm, Duncan Booth <[EMAIL PROTECTED]> wrote: >> Can you come up with a real example where this happens and which cannot be >> easily rewritten to provide better, clearer code without the indentation? >> >> I'll admit to having occasionally had code not entirely dissimilar to this >> when first written, but I don't believe it has ever survived more than a >> few minutes before being refactored into a cleaner form. I would claim that >> it is a good thing that Python makes it obvious that code like this should >> be refactored. > > I am trying to write a parser for a text string. Specifically, I am > trying to take a filename that contains meta-data about the content of > the A/V file (mpg, mp3, etc.). > > I first split the filename into fields separated by spaces and dots. > > Then I have a series of regular expression matches. I like > Cartesian's 'event-based' parser approach though the even table gets a > bit unwieldy as it grows. Also, I would prefer to have the 'action' > result in a variable assignment specific to the test. E.g. > > def parseName(name): > fields = sd.split(name) > fields, ext = fields[:-1], fields[-1] > year = '' > capper = '' > series = None > episodeNum = None > programme = '' > episodeName = '' > past_title = false > for f in fields: > if year_re.match(f): > year = f > past_title = True > else: > my_match = capper_re.match(f): > if my_match: > capper = capper_re.match(f).group(1) > if capper == 'JJ' or capper == 'JeffreyJacobs': > capper = 'Jeffrey C. Jacobs' > past_title = True > else: > my_match = epnum_re.match(f): > if my_match: > series, episodeNum = my_match.group('series', > 'episode') > past_title = True > else: > # If I think of other parse elements, they go > here. > # Otherwise, name is part of a title; check for > capitalization > if f[0] >= 'a' and f[0] <= 'z' and f not in > do_not_capitalize: > f = f.capitalize() > if past_title: > if episodeName: episodeName += ' ' > episodeName += f > else: > if programme: programme += ' ' > programme += f > > return programme, series, episodeName, episodeNum, year, capper, > ext
Why can't you combine your regular expressions into a single expression, e.g. something like:: >>> exp = r''' ... (?P<year>\d{4}) ... | ... by\[(?P<capper>.*)\] ... | ... S(?P<series>\d\d)E(?P<episode>\d\d) ... ''' >>> matcher = re.compile(exp, re.VERBOSE) >>> matcher.match('1990').groupdict() {'series': None, 'capper': None, 'episode': None, 'year': '1990'} >>> matcher.match('by[Jovev]').groupdict() {'series': None, 'capper': 'Jovev', 'episode': None, 'year': None} >>> matcher.match('S01E12').groupdict() {'series': '01', 'capper': None, 'episode': '12', 'year': None} Then your code above would look something like:: for f in fields: match = matcher.match(f) if match is not None: year = match.group('year') capper = match.group('capper') if capper == 'JJ' or capper == 'JeffreyJacobs': capper = 'Jeffrey C. Jacobs' series = match.group('series') episodeNum = match.group('episode') past_title = True else: if 'a' <= f[0] <= 'z' and f not in do_not_capitalize: f = f.capitalize() if past_title: if episodeName: episodeName += ' ' episodeName += f else: if programme: programme += ' ' programme += f STeVe -- http://mail.python.org/mailman/listinfo/python-list