Steven Bethard wrote: > [EMAIL PROTECTED] wrote: >> On Apr 5, 4:22 pm, Duncan Booth <[EMAIL PROTECTED]> wrote: >>> Can you come up with a real example where this happens and which >>> cannot be >>> easily rewritten to provide better, clearer code without the >>> indentation? >>> >>> I'll admit to having occasionally had code not entirely dissimilar to >>> this >>> when first written, but I don't believe it has ever survived more than a >>> few minutes before being refactored into a cleaner form. I would >>> claim that >>> it is a good thing that Python makes it obvious that code like this >>> should >>> be refactored. >> >> I am trying to write a parser for a text string. Specifically, I am >> trying to take a filename that contains meta-data about the content of >> the A/V file (mpg, mp3, etc.). >> >> I first split the filename into fields separated by spaces and dots. >> >> Then I have a series of regular expression matches. I like >> Cartesian's 'event-based' parser approach though the even table gets a >> bit unwieldy as it grows. Also, I would prefer to have the 'action' >> result in a variable assignment specific to the test. E.g. >> >> def parseName(name): >> fields = sd.split(name) >> fields, ext = fields[:-1], fields[-1] >> year = '' >> capper = '' >> series = None >> episodeNum = None >> programme = '' >> episodeName = '' >> past_title = false >> for f in fields: >> if year_re.match(f): >> year = f >> past_title = True >> else: >> my_match = capper_re.match(f): >> if my_match: >> capper = capper_re.match(f).group(1) >> if capper == 'JJ' or capper == 'JeffreyJacobs': >> capper = 'Jeffrey C. Jacobs' >> past_title = True >> else: >> my_match = epnum_re.match(f): >> if my_match: >> series, episodeNum = my_match.group('series', >> 'episode') >> past_title = True >> else: >> # If I think of other parse elements, they go >> here. >> # Otherwise, name is part of a title; check for >> capitalization >> if f[0] >= 'a' and f[0] <= 'z' and f not in >> do_not_capitalize: >> f = f.capitalize() >> if past_title: >> if episodeName: episodeName += ' ' >> episodeName += f >> else: >> if programme: programme += ' ' >> programme += f >> >> return programme, series, episodeName, episodeNum, year, capper, >> ext > > Why can't you combine your regular expressions into a single expression, > e.g. something like:: > > >>> exp = r''' > ... (?P<year>\d{4}) > ... | > ... by\[(?P<capper>.*)\] > ... | > ... S(?P<series>\d\d)E(?P<episode>\d\d) > ... ''' > >>> matcher = re.compile(exp, re.VERBOSE) > >>> matcher.match('1990').groupdict() > {'series': None, 'capper': None, 'episode': None, 'year': '1990'} > >>> matcher.match('by[Jovev]').groupdict() > {'series': None, 'capper': 'Jovev', 'episode': None, 'year': None} > >>> matcher.match('S01E12').groupdict() > {'series': '01', 'capper': None, 'episode': '12', 'year': None} > > Then your code above would look something like:: > > for f in fields: > match = matcher.match(f) > if match is not None: > year = match.group('year') > capper = match.group('capper') > if capper == 'JJ' or capper == 'JeffreyJacobs': > capper = 'Jeffrey C. Jacobs' > series = match.group('series') > episodeNum = match.group('episode') > past_title = True
I guess you need to be a little more careful here not to overwrite your old values, e.g. something like:: year = match.group('year') or year capper = match.group('capper') or capper ... STeVe -- http://mail.python.org/mailman/listinfo/python-list