James Stroud wrote: > James Stroud wrote: >> John Pye wrote: >>> Hi all >>> >>> I have a file with a bunch of perl regular expressions like so: >>> >>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ # >>> bold >>> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/ >>> b>''$3/ # italic bold >>> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ # >>> italic >>> >>> These are all find/replace expressions delimited as '/search/replace/ >>> # comment' where 'search' is the regular expression we're searching >>> for and 'replace' is the replacement expression. >>> >>> Is there an easy and general way that I can split these perl-style >>> find-and-replace expressions into something I can use with Python, eg >>> re.sub('search','replace',str) ? >>> >>> I though generally it would be good enough to split on '/' but as you >>> see the <\/b> messes that up. I really don't want to learn perl >>> here :-) >>> >>> Cheers >>> JP >>> >> >> This could be more general, in principal a perl regex could end with a >> "\", e.g. "\\/", but I'm guessing that won't happen here. >> >> py> for p in perlish: >> ... print p >> ... >> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ >> /(^|[\s\(])\_\_([^ ].*?[^ >> ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^ >> ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ py> import re >> py> splitter = re.compile(r'[^\\]/') >> py> for p in perlish: >> ... print splitter.split(p) >> ... >> ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$', >> "$1'''$2'''$", ''] >> ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$', >> "$1''<b>$2<\\/b>''$", ''] >> ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$', >> "$1''$2''$", ''] >> >> (I'm hoping this doesn't wrap!) >> >> James > > I realized that threw away the closing parentheses. This is the correct > version: > > py> splitter = re.compile(r'(?<!\\)/') > py> for p in perlish: > ... print splitter.split(p) > ... > ['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', > "$1'''$2'''$3", ''] > ['', '(^|[\\s\\(])\\_\\_([^ ].*?[^ > ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", ''] > ['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', > "$1''$2''$3", '']
There is another problem with escaped backslashes: >>> re.compile(r'(?<!\\)/').split(r"/abc\\/def/") ['', 'abc\\\\/def', ''] Peter -- http://mail.python.org/mailman/listinfo/python-list