I have a task to search for multiple patterns in incoming string and replace 
with matched patterns, I'm storing all pattern as keys in dict and replacements 
as values, I'm using regex for compiling all the pattern and using the sub 
method on pattern object for replacement. But the problem I have a tens of 
millions of rows, that I need to check for pattern which is about 1000 and this 
is turns out to be a very expensive operation.

What can be done to optimize it. Also I have special characters for matching, 
where can I specify raw string combinations. 

for example is the search string is not a variable we can say 

re.search(r"\$%^search_text", "replace_text", "some_text") but when I read from 
the dict where shd I place the "r" keyword, unfortunately putting inside key 
doesnt work "r key" like this....

Pseudo code

for string in genobj_of_million_strings:
   pattern = re.compile('|'.join(regex_map.keys()))
   return pattern.sub(lambda x: regex_map[x], string)
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to