[issue2650] re.escape should not escape underscore

SilentGhost Mon, 14 Mar 2011 07:48:14 -0700

SilentGhost <ghost....@gmail.com> added the comment:

I think these are two different questions:
 1. What to escape
 2. What to do about poor performance of the re.escape when re.sub is used


In my opinion, there isn't any justifiable reason to escape non-meta 
characters: it doesn't affect matching; escaped strings are typically just 
re-used in regex.

I would favour simpler and cleaner code with re.sub. I don't think that 
re.quote could be a performance bottleneck in any application. I did some 
profiling with python3.2 and it seems that the reason for this poor performance 
is many abstraction layers when using re.sub. However, we need to bear in mind 
that we're only talking about 40 usec difference for a 100-char string 
(string.printable): I'd think that strings being escaped are typically shorter.

As a compromise, I tested this code:

_mp = {ord(i): '\\' + i for i in '][.^$*+?{}\\|()'}

def escape(pattern):
    if isinstance(pattern, str):
        return pattern.translate(_mp)
    return sub(br'([][.^$*+?{}\\|()])', br'\\\1', pattern)

which is fast (faster than existing code) for str and slow for bytes patterns.
I don't particularly like it, because of the difference between str and bytes 
handling, but I do think that it will be much easier to "fix" once/when/if re 
module is improved.

----------
keywords:  -patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue2650>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2650] re.escape should not escape underscore

Reply via email to