py> countdicts = [ ... dict(a=9, b=9, c=9), ... dict(a=8, b=7), ... dict(a=4, b=5, c=12)]
I need to select dicts with the constraint that the number of each 'word' totalled over all selected dicts doesn't exceed a given MAX_VALUE. Right now, I do this by:
py> def select(dicts, n): ... result = [] ... counts = {} ... def doadd(d): ... for label, count in d.iteritems(): ... if counts.get(label, 0) + count > n: ... return False ... return True ... for d in dicts: ... if doadd(d): ... result.append(d) ... for label, count in d.iteritems(): ... counts[label] = counts.get(label, 0) + count ... return result ...
Basically, I use a greedy approach -- adding a dict each time if I can. This leads to some suboptimal solutions given that, while the total counts must not exceed MAX_VALUE, I'd like them to be as close to MAX_VALUE as possible. An example of data on which my select function produces such a suboptimal solution:
py> countdicts = [ ... dict(a=9, b=9, c=9), ... dict(a=8, b=7), ... dict(a=4, b=5, c=12)] py> select(countdicts, 12) [{'a': 9, 'c': 9, 'b': 9}]
The better solution would have been to select the second two dicts -- then all 'words' would have count 12.
I don't need an optimal solution; in fact, the solution I have right now is tolerable. But if anyone knows a simple way to improve my performance, I'd appreciate it. Thanks!
STeVe
P.S. No, I'm not just making these problems up! I'm sick, but not that sick. ;) It has to do with resampling a set of data... If you're really interested, I can provide the full details, but they'll only make the problem even _more_ complicated. =)
--
http://mail.python.org/mailman/listinfo/python-list