Jacques Grove <aquara...@gmail.com> added the comment: More an observation than a bug:
I understand that we're trading memory for performance, but I've noticed that the peak memory usage is rather high, e.g.: $ cat test.py import os import regex as re def resident(): for line in open('/proc/%d/status' % os.getpid(), 'r').readlines(): if line.startswith("VmRSS:"): return line.split(":")[-1].strip() cache = {} print resident() for i in xrange(0,1000): cache[i] = re.compile(str(i)+"(abcd12kl|efghlajsdf|ijkllakjsdf|mnoplasjdf|qrstljasd|sdajdwxyzlasjdf|kajsdfjkasdjkf|kasdflkasjdflkajsd|klasdfljasdf)") print resident() Execution output on my machine (Linux x86_64, Python 2.6.5): 4328 kB 32052 kB with the standard regex library: 3688 kB 5428 kB So, it looks like around 16x the memory per pattern vs standard regex module Now the example is pretty silly, the difference is even larger for more complex regexes. I also understand that the once the patterns are GC-ed, python can reuse the memory (pymalloc doesn't return it to the OS, unfortunately). However, I have some applications that use large numbers (many thousands) of regexes and need to keep them cached (compiled) indefinitely (especially because compilation is expensive). This causes some pain (long story). I've played around with increasing RE_MIN_FAST_LENGTH, and it makes a significant difference, e.g.: RE_MIN_FAST_LENGTH = 10: 4324 kB 25976 kB In my use-cases, having a larger RE_MIN_FAST_LENGTH doesn't make a huge performance difference, so that might be the way I'll go. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2636> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com