In article <[EMAIL PROTECTED]>, Steve <[EMAIL PROTECTED]> wrote: >Hi All! > >Thanks for your suggestions and comments! I was able to use some of >your code and suggestions and have come up with this new version of >Report.py. > >Here's the updated code : > >exponentPattern = re.compile('\(^\|[^\\#]\)|#+\.#+\*\*\*\*') >floatPattern = re.compile('\(^\|[^\\#]\)|#+\.#+') >integerPattern = re.compile("\(^\|[^\\#]\)|\##+") >leftJustifiedStringPattern = re.compile('\(^\|[^\\<]\)|\<<+') >rightJustifiedStringPattern = re.compile('\(^\|[^\\>]\)|\>>+')
Some comments and suggestions If you want to include backslashes in a string, either use raw strings, or use double backslashes (raw strings are much easier to read). Otherwise, you have an accident waiting to happen - '\(' _does_ make a two character string as you desired, but '\v' makes a one character string, which is unlikely to be what you wanted. That's a stylistic point. But more serious is the leading part of all your regexes - '\(^\|[^\\#]\)|'. I'm not sure what you're trying to accomplish - presumably to skip over escaped format characters, but that's not what they do. \( says to look for an open parenthesis (you've escaped it, so it's not a grouping character. ^ says look at the start of the line. This means the first character can never match an open parens, so this entire term, up to the alternate expression (after the non-escaped pipe symbol) never matches anything. If you want to ignore escaped formating characters before a format, then you should use a negative lookbehind assertation (see the library reference, 4.2.1, Regular Expression Syntax: '(?<!\\)' This says that the match to the format can't start immediately after a backslash. You need to make a final pass over your format to remove the extra backslashes, otherwise they will appear in the output, which you do, but you replace them with spaces, which may not be the right thing to do - how could you output a line of like '################' as part of a template? Other odds and ends comments interleaved here >################################################################### ># _make_printf # >################################################################### > >def _make_printf(s): > """Convert perl style format symbology to printf tokens. > > Take a string and substitute computed printf tokens for perl style > format symbology. > > For example: > > ###.## yields %6.2f > ######## yields %8d > <<<<< yields %-5s > """ ># print("Original String = %s\n\n") % (s) > > > while 1: # process all sci notation fields > if exponentPattern.search(s) < 0: break > i1 , i2 = exponentPattern.search(s).span() > width_total = i2 - i1 > field = s[i1:i2-4] > width_mantissa = len( field[string.find(field,'.')+1:] ) > f = '%'+`width_total`+'.'+`width_mantissa`+'e' > s = exponentPattern.sub(f, s, 1) There are better ways to examine a match than with span() - consider using grouping in your regex to get the mantissa width and the total width, use a regex with grouping like this: If: exponentPattern = re.compile(r'(?<!\\)(#+\.(#+)\*\*\*\*')) then you could do this: m = re.match(exponentPattern, s) if m: s = exponentPattern.sub("%%%d.%de" % (len(m.groups()[0], len(m.groups()[1]), s, 1) m.groups()[0] will be the entire '#+\.#\*\*\'*\*') match, in other words the field width to be printed m.groups()[1] will be the string after the decimal point, not inclouding the '*'s In my opinion, building the string by using the sprintf like '%' formatting operator, rather than adding together a bunch of substrings is easier to read and maintain. Similar use of grouping can be done for the other format string types. > s = re.sub('\\\\', ' ', s) > return s As I noted, should backslashes be converted to spaces? And again, it's easier to type and read if it uses raw strings: s = re.sub(r'\\', ' ', s) >################################################################### ># ReportTemplate # >################################################################### > >class ReportTemplate: > """Provide a print formatting object. [Snip] > The total width of the symbol and it's decimal point position is > used to compute the appropriate printf token; see 'make_printf' > method. The symbol must have at least two adjacent characters for A minor nit - make_printf is better described as a function, not a method (it's not part of the ReportTemplate class or any other cleass) [SNIP] > def __init__( self, template = ''): > self.body = [] > self.vars = [] > > #read in and parse a format template > try: > tpl = open(template, 'r') > lines = string.split(tpl.read(), '\n')[:-1] You'd be better off to use something like: lines = [] for l in open(template, 'r').readlines(): lines.append(l.rstrip) The use of rstrip discards any trailing whitespace and deals with reading Windows generated files on a Unix box, where lines will end in CRLF and you'd strip only the LF > except IOError: > lines = string.split(template, '\n') I have my doubts about the advisability of assuming that you are either passed the name of a file containing a template or a template itself. A misspelled file name won't raise an error, it will simply be processed as a fixed output. I would have passed a flag to say if the template argument was file name or a template and terminated with an error if it was a non-existant file. [SNIP] > def _format( self, dataobj ): > """Return the values of the given data object substituted into > the template format stored in this object. > """ > # value[] is a list of lists of values from the dataobj > # body[] is the list of strings with %tokens to print > # if value[i] == None just print the string without the % >argument > s = '' > value = [] > > for i in range(self.nrows): > value.append([]) > > for i in range(self.nrows): > for vname in self.vars[i]: > try: > if string.find(vname, '[') < 0: > # this is the nominal case and a simple get >will be faster > value[i].append(getattr(dataobj, vname)) > else: > # I use eval so that I can support sequence >values > # although it's slow. > value[i].append(eval('dataobj.'+vname)) > except AttributeError, SyntaxError: > value[i].append('') There's another way to do this - use getattr to retrieve the sequence, then use __getitem__ to index it Something like this would work, again using a regex to separate out the index (you might want the regex compiled once at __init__ time). The regex looks for a run of characters valid in a python variable name followed by an optional integer (with or without sign) index in square brackets. m.groups()[0] is the variable name portion, m.grousp()[1] is the index if there's a subscripting term. try: m = re.match(r'([a-zA-Z_][a-zA-Z0-9._]*)\s*(?:\[\s*([+-]?\d+)\s*\])?\s*', value[i]) if not m: raise SyntaxError if m.groups()[1] is None: value[i].append(getattr(dataobj, vname)) else: value[i].append(getattr(m.groups()[0]).\ __getitem__(int(m.groups()[1]))) except AttributeError, SyntaxError, IndexError: > value[i].append('') This is a bit ugly, but avoids eval with all the dangers it carries - a deliberately hacked template file can be used to do a lot of damage, a badly written one could be hard to debug > if value[i][0] != '': > try: > temp_vals = [] > for item in value[i]: > # items on the list of values for this line > # can be either literals or lists > if type(item) == ListType: Might you be better off asking if item has a __getitem__? It would then work with tuples and Extending to dictionaries would then be easier >def isReportTemplate(obj): > """Return 1 if obj is an instance of class ReportTemplate. > """ > if type(obj) == InstanceType and \ > string.find(`obj.__class__` , ' ReportTemplate ') > -1: > return 1 > else: > return 0 Why not just use isinstance(obj, classname)? >################################################################### ># ColumnReportTemplate # >################################################################### > >class ColumnReportTemplate: > """This class allows one to specify column oriented output formats. > -- Jim Segrave ([EMAIL PROTECTED]) -- http://mail.python.org/mailman/listinfo/python-list