Hallöchen! John Machin writes:
> On May 18, 6:00 am, Torsten Bronger <[EMAIL PROTECTED]> > wrote: > >> [...] >> >> Example string: u"Hollo", escaped positions: [4]. Thus, the >> second "o" is escaped and must not be found be the regexp >> searches. >> >> Instead of re.search, I call the function guarded_search(pattern, >> text, offset) which takes care of escaped caracters. Thus, while >> >> re.search("o$", string) >> >> will find the second "o", >> >> guarded_search("o$", string, 0) > > Huh? Did you mean 4 instead of zero? No, the "offset" parameter is like the "pos" parameter in the search method of regular expression objects. It's like guarded_search("o$", string[offset:]) Actually, my real guarded_search even has an "endpos" parameter, too. > [...] > > Quite apart from the confusing use of "escape", your requirements are > still as clear as mud. Try writing up docs for your "guarded_search" > function. Note that I don't want to add functionality to the stdlib, I just want to solve my tiny annoying problem. Okay, here is a more complete story: I've specified a simple text document syntax, like reStructuredText, Wikimedia, LaTeX or whatever. I already have a preprocessor for it, now I try to implement the parser. A sectioning heading looks like this: Introduction ============ Thus, my parser searches (among many other things) for r"\n\s*={4,}\s*$". However, the author can escape any character with a backslash: Introduction or Introduction \=========== ====\======= This means the first (or fifth) equation sign is an equation sign as is and not part of a heading underlining. This must not be interpreted as a section begin. The preprocessor generates u"===========" with escaped_positions=[0]. (Or [4], in the righthand case.) This is why I cannot use normal search methods. > [...] > > Whatever your exact requirement, it would seem unlikely to be so > wildly popularly demanded as to warrant inclusion in the "regexp > machine". You would have to write your own wrapper, something like > the following totally-untested example of one possible > implementation of one possible guess at what you mean: > > import re > def guarded_search(pattern, text, forbidden_offsets, overlap=False): > regex = re.compile(pattern) > pos = 0 > while True: > m = regex.search(text, pos) > if not m: > return > start, end = m.span() > for bad_pos in forbidden_offsets: > if start <= bad_pos < end: > break > else: > yield m > if overlap: > pos = start + 1 > else: > pos = end > 8<------- This is similar to my current approach, however, it also finds too many "^a" patterns because it starts a fresh search at different positions. Tschö, Torsten. -- Torsten Bronger, aquisgrana, europa vetus Jabber ID: [EMAIL PROTECTED] (See http://ime.webhop.org for ICQ, MSN, etc.) -- http://mail.python.org/mailman/listinfo/python-list