Andreas Beyer <[EMAIL PROTECTED]> wrote: > Hi, > > I found the following quite cryptic code, which basically reads the > first column of some_file into a set. > In Python I am used to seeing much more verbose/explicit code. However, > the example below _may_ actually be faster than the usual "for line in ..." > Do you consider this code good Python style? Or would you recommend to > refrain from such complex single-line code?? > > Thanks! > Andreas > > inp = resource(some_file) > # read first entries of all non-empty lines into a set > some_set = frozenset([line.split()[0] for line in \ > filter(None, [ln.strip() for ln in inp])])
Sparse is better than dense, and this code is far from the fastest one could write -- it builds useless lists comprehensions where genexps would do, it splits off all whitespace-separated words (rather than just the first one) then tosses all but the first away, etc. frozenset(first_word_of_line for line in input_file for first_word_of_line in line.split(None, 1)[:1] ) does not sacrifice any speed (on the contrary), yet achieves somewhat better clarity by clearer variable names, reasonable use of whitespace for formatting, and avoidance of redundant processing. If this set didn't have to be frozen, building it in a normal for loop (with a .add call in the nested loop) would no doubt be just as good in terms of both clarity and performance; however, frozen sets (like tuples) don't lend themselves to such incremental building "in place" (you'd have to build the mutable version, then "cast" it at the end; the alternative of making a new frozenset each time through the loop is clearly unpleasant), so I can see why one would want to avoid that. A good, readable, and only slightly slower alternative is to write a named generator: def first_words(input_file): for line in input_file: first_word_if_any = line.split(None, 1) if first_word_if_any: yield first_word_if_any[0] ff = frozenset(first_words(input_file)) A full-fledged generator gives you more formatting choices, an extra name to help, and the possibility of naming intermediate variables. Whether it's worth it in this case is moot -- personally I find the "slice then for-loop" idiom (which I used in the genexp) just as clear as the "test then index" one (which I used in first_words) to express the underlying notion "give me the first item if any, but just proceed without giving anything if the list is empty"; but I can imagine somebody else deciding that the test/index idiom is a more direct expression of that notion. You could use it in a genexp, too, of course: frozenset(first_word_if_any[0] for line in input_file for first_word_if_any in [line.split(None, 1)] if first_word_if_any ) but this requires the "for x in [y]" trick to simulate the "x=y" assigment of an intermediate variable in a genexp or LC, which is never an elegant approach, so I wouldn't recommend it. Alex -- http://mail.python.org/mailman/listinfo/python-list