Luís Pedro Coelho <l...@luispedro.org> added the comment: Original poster here.
The benchmark is artificial, but the problem setting is not. I did have a problem that is roughly: interesting = set(line.strip() for line in open(...)) for line in open(...): key,rest = line.split('\t', 1) if key in interesting: process(rest) Deleting the set (when it goes out of scope) was a significant chunk of the time. Surprisingly, deleting a very large set takes much longer than creating it. Here are my controlled measurements (created with the attached script, which itself uses Jug http://jug.rtfd.io and assumes a file `input.txt` is present). N create (s) delete (s) 1 0.00 0.00 10 0.00 0.00 100 0.00 0.00 1000 0.00 0.00 10000 0.01 0.00 100000 0.15 0.01 1000000 1.14 0.12 10000000 11.44 2.24 100000000 126.41 70.34 200000000 198.04 258.44 300000000 341.27 646.81 400000000 522.70 1044.97 500000000 564.07 1744.54 600000000 1380.04 3364.06 700000000 1797.82 3300.20 800000000 1294.20 4410.22 900000000 3045.38 7646.59 1000000000 3467.31 11042.97 1100000000 5162.35 13750.22 1200000000 6581.90 12544.67 1300000000 1612.60 17204.67 1400000000 1788.13 23772.84 1500000000 6922.16 25068.49 ---------- nosy: +l...@luispedro.org Added file: https://bugs.python.org/file47448/time-set.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32846> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com