Re: selecting dictionaries to maximize counts

Steven Bethard Fri, 18 Feb 2005 15:20:05 -0800

John Machin wrote:

Steven Bethard wrote:
I have a list of dictionaries. Each dictionary holds counts of various 'words'... ... Basically, I use a greedy approach -- adding a dict each time if I can. ... This leads to some suboptimal solutions given that, while the total counts must not exceed MAX_VALUE, I'd like them to be as close to MAX_VALUE as possible. An example of data on which my select function produces such a suboptimal solution:
py> countdicts = [
...     dict(a=9, b=9, c=9),
...     dict(a=8, b=7),
...     dict(a=4, b=5, c=12)]
py> select(countdicts, 12)
[{'a': 9, 'c': 9, 'b': 9}]
The better solution would have been to select the second two dicts --
then all 'words' would have count 12.
I don't need an optimal solution; in fact, the solution I have right
now is tolerable.  But if anyone knows a simple way to improve my
performance, I'd appreciate it.  Thanks!
You should specify a function that assigns a score to valid solutions;
it's not apparent from the wording above and the example exactly what
that function might be. Thinking through the alternatives might help
you solve the problem yourself :-)

Yeah, I guess that's a good idea. Here's a scoring function that should give the right range of scores (where 1.0 is perfect and 0.0 is failure):

py> from __future__ import division
py> def score(origdicts, selecteddicts, n):
...     origwords = set(word for d in origdicts for word in d)
...     counts = {}
...     for d in selecteddicts:
...         for word, count in d.iteritems():
...             counts[word] = counts.get(word, 0) + count
...             if counts[word] > n:
...                 return 0.0
...     return sum(counts.itervalues())/len(origwords)/n
...
py> countdicts = [
...     dict(a=9, b=9, c=9),
...     dict(a=8, b=7),
...     dict(a=4, b=5, c=12)]
py> score(countdicts, countdicts, 12)
0.0
py> score(countdicts, select(countdicts, 12), 12)
0.75
py> score(countdicts, [dict(a=8, b=7), dict(a=4, b=5, c=12)], 12)
1.0

It's pretty easy to build a configuration that can't be solved with a perfect score:

py> def subsets(lst): ... if not lst: ... yield [] ... else: ... first = lst[:1] ... for subset in subsets(lst[1:]): ... yield subset ... yield first + subset ... py> countdicts = [ ... dict(a=1, b=2, c=3), ... dict(a=4, b=5, c=6), ... dict(a=7, b=8, c=9)] py> for dicts in subsets(countdicts): ... print score(countdicts, dicts, 9), dicts ... 0.0 [] 0.222222222222 [{'a': 1, 'c': 3, 'b': 2}] 0.555555555556 [{'a': 4, 'c': 6, 'b': 5}] 0.777777777778 [{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}] 0.888888888889 [{'a': 7, 'c': 9, 'b': 8}] 0.0 [{'a': 1, 'c': 3, 'b': 2}, {'a': 7, 'c': 9, 'b': 8}] 0.0 [{'a': 4, 'c': 6, 'b': 5}, {'a': 7, 'c': 9, 'b': 8}] 0.0 [{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}, {'a': 7, 'c': 9, 'b': 8}]

Hmmm... I can't do an exhaustive search like this one here -- I have ~8000 dicts, so 2**N is definitely too large of a space to search. Now that I have a scoring routine maybe I could do an A* search though...

Thanks for the input!

STeVe
--
http://mail.python.org/mailman/listinfo/python-list

Re: selecting dictionaries to maximize counts

Reply via email to