Hello Lately I have been writing a lot of list join() operations variously including (and included in) string format() operations.
For example: temps = [24.369, 24.550, 26.807, 27.531, 28.752] out = 'Temperatures: {0} Celsius'.format( ', '.join('{0:.1f}'.format(t) for t in temps) ) # => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius' This is just a simple example, my actual code has many more join and format operations, split into local variables as needed for clarity. Then I remembered that Ye Old Common Lisp's format operator had built-in list traversing capabilities[1]: (format t "Temperatures: ~{~1$~^, ~} Celsius" temps) That format string (the part in the middle that looks like line noise) is admittedly arcane, but it's parsed like this: ~{ take next argument (temp) and start iterating over its contents ~1$ output a floating point number with 1 digit precision ~^ break the loop if there are no more items available ", " (otherwise) output a comma and space ~} end of the loop body Now, as much as I appreciate the heritage of Lisp, I won't deny than its format string mini-language is EVIL. As a rule, format string placeholders should not include *imperative statements* such as for, break, continue, and if. We don't need a Turing-complete language in our format strings. Still, this is the grand^n-father of Python's format strings, so it's interesting to look at how it used to approach the list joining issue. Then I asked myself: can I take the list joining capability and port it over to Python's format(), doing away with the overall ugliness? Here is what I came up with: out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps) # => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius' Here ", " is the joiner between the items and <.1f> is the format string for each item. The way this would work is by defining a specific Format Specification Mini-Language for sequences (such as lists, tuples, and iterables). A Format Specification Mini-Language (format_spec) is whatever follows the first colon in a curly brace placeholder, and is defined by the argument's class, so that it can vary wildly among different types.[2] The root class (object) defines the generic format_spec we are accustomed to[3]: [[fill]align][sign][#][0][width][,][.precision][type] But that doesn't mean that more complex types should not define extensions or replacements. I propose this extended format_spec for sequences: seq_format_spec ::= join_string [":" item_format_spec] | format_spec join_string ::= '"' join_string_char* '"' | "'" join_string_char* "'" join_string_char ::= <any character except "{", "}", newline, or the quote> item_format_spec ::= format_spec That is, if the format_spec for a sequence starts with ' or " it would be interpreted as a join operation (eg. {0:", "} or {0:', '}) optionally followed by a format_spec for the single items: {0:", ":.1f} If the format_spec does not start with ' or ", of if the quote is not balanced (does not appear again in the format_spec), then it's assumed to be a generic format string and the implementation would call super(). This is meant for backwards compatibility with existing code that may be using the generic format_spec over various sequences. I do think that would be quite readable and useful. Look again at the example: out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps) As a bonus, it allows nested joins, albeit only for simple cases. For example we could format a dictionary's items: temps = {'Rome': 26, 'Paris': 21, 'New York': 18} out = 'Temperatures: {0:", ":" ":s}'.format(temps.items()) # => 'Temperatures: Rome 26, Paris 21, New York 18' Here the format_spec for temps.items() is <", ":" ":s>. Then ", " would be used as a joiner between the item tuples and <" ":s> would be passed over as the format_spec for each tuple. This in turn would join the tuple's items using a single space and output each item with its simple string format. This could go on and on as needed, adding a colon and joiner string for each nested join operation. A more complicated mini-language would be needed to output dicts using different format strings for keys and values, but I think that would be veering over to unreadable territory. What do you think? I plan to write this as a module and propose it to Python's devs for inclusion in the main tree, but any criticism is welcome before I do that. -Tobia [1] http://www.gigamonkeys.com/book/a-few-format-recipes.html [2] http://docs.python.org/3/library/string.html#formatstrings [3] http://docs.python.org/3/library/string.html#formatspec -- http://mail.python.org/mailman/listinfo/python-list