[Fredrik Lundh] >>> bdict = dict.fromkeys(open(bfile).readlines()) >>> >>> for line in open(afile): >>> if line not in bdict: >>> print line, >>> >>> </F>
[Tim Peters] >> Note that an open file is an iterable object, yielding the lines in >> the file. The "for" loop exploited that above, but fromkeys() can >> also exploit it. That is, >> >> bdict = dict.fromkeys(open(bfile)) >> >> is good enough (there's no need for the .readlines()). [/F] > (sigh. my brain knows that, but my fingers keep forgetting) > > and yes, for this purpose, "dict.fromkeys" can be replaced > with "set". > > bdict = set(open(bfile)) > > (and then you can save a few more bytes by renaming the > variable...) [Tim Peters] > Except the latter two are just shallow spelling changes. Switching > from fromkeys(open(f).readlines()) to fromkeys(open(f)) is much more > interesting, since it can allow major reduction in memory use. Even > if all the lines in the file are pairwise distinct, not materializing > them into a giant list can be a significant win. I wouldn't have > bothered replying if the only point were that you can save a couple > bytes of typing <wink>. fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be equivalent. When I pass an iterator instance(or a generator iterator) to the dict.fromkeys, it is expanded at that moment, thus fromkeys(open(f)) is effectively same with fromkeys(list(open(f))) and fromkeys(open(f).readlines()). Am I missing something? Jane -- http://mail.python.org/mailman/listinfo/python-list