The straightforward way to solve this problem is to create a dictionary. Like so:
[...]
a, b = get_information(line) if a in dict.keys(): dict[a].append(b) else: dict[a] = [b]
So I timed the three suggestions with a few different datasets:
> cat builddict.py def askpermission(d, k, v): if k in d: d[k].append(v) else: d[k] = [v]
def askforgiveness(d, k, v): try: d[k].append(v) except KeyError: d[k] = [v]
def default(d, k, v): d.setdefault(k, []).append(v)
def test(items, func): d = {} for k, v in items: func(d, k, v)
Dataset where every value causes a collision:
> python -m timeit -s "import builddict as bd" "bd.test([(100, i) for i in range(1000)], bd.askpermission)"
1000 loops, best of 3: 1.62 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(100, i) for i in range(1000)], bd.askforgiveness)"
1000 loops, best of 3: 1.58 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(100, i) for i in range(1000)], bd.default)"
100 loops, best of 3: 2.03 msec per loop
Dataset where no value causes a collision:
> python -m timeit -s "import builddict as bd" "bd.test([(i, i) for i in range(1000)], bd.askpermission)"
100 loops, best of 3: 2.29 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i, i) for i in range(1000)], bd.askforgiveness)"
100 loops, best of 3: 9.96 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i, i) for i in range(1000)], bd.default)"
100 loops, best of 3: 2.98 msec per loop
Datset where one of every 5 values causes a collision:
> python -m timeit -s "import builddict as bd" "bd.test([(i % 5, i) for i in range(1000)], bd.askpermission)"
1000 loops, best of 3: 1.82 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i % 5, i) for i in range(1000)], bd.askforgiveness)"
1000 loops, best of 3: 1.79 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i % 5, i) for i in range(1000)], bd.default)"
100 loops, best of 3: 2.2 msec per loop
So, when there are lots of collisions, you may get a small benefit from the try/except solution. If there are very few collisions, you probably would rather the if/else solution. The setdefault solution patterns about like the if/else solution, but is somewhat slower.
I will probably continue to use setdefault, because I think it's prettier =) but if you're running into a speed bottleneck in this sort of situation, you might consider one of the other solutions.
Steve -- http://mail.python.org/mailman/listinfo/python-list