On May 17, 4:06 pm, John Machin <[EMAIL PROTECTED]> wrote: > On May 18, 6:00 am, Torsten Bronger <[EMAIL PROTECTED]> > wrote: > > > > > > > Hallöchen! > > > James Stroud writes: > > > Torsten Bronger wrote: > > > >> I need some help with finding matches in a string that has some > > >> characters which are marked as escaped (in a separate list of > > >> indices). Escaped means that they must not be part of any match. > > > >> [...] > > > > You should probably provide examples of what you are trying to do > > > or you will likely get a lot of irrelevant answers. > > > Example string: u"Hollo", escaped positions: [4]. Thus, the second > > "o" is escaped and must not be found be the regexp searches. > > > Instead of re.search, I call the function guarded_search(pattern, > > text, offset) which takes care of escaped caracters. Thus, while > > > re.search("o$", string) > > > will find the second "o", > > > guarded_search("o$", string, 0) > > Huh? Did you mean 4 instead of zero? > > > > > won't find anything. > > Quite apart from the confusing use of "escape", your requirements are > still as clear as mud. Try writing up docs for your "guarded_search" > function. Supply test cases showing what you expect to match and what > you don't expect to match. Is "offset" the offset in the text? If so, > don't you really want a set of "forbidden" offsets, not just one? > > > But how to program "guarded_search"? > > Actually, it is about changing the semantics of the regexp syntax: > > "." doesn't mean anymore "any character except newline" but "any > > character except newline and characters marked as escaped". > > Make up your mind whether you are "escaping" characters [likely to be > interpreted by many people as position-independent] or "escaping" > positions within the text. > > > And so > > on, for all syntax elements of regular expressions. Escaped > > characters must spoil any match, however, the regexp machine should > > continue to search for other matches. > > Whatever your exact requirement, it would seem unlikely to be so > wildly popularly demanded as to warrant inclusion in the "regexp > machine". You would have to write your own wrapper, something like the > following totally-untested example of one possible implementation of > one possible guess at what you mean: > > import re > def guarded_search(pattern, text, forbidden_offsets, overlap=False): > regex = re.compile(pattern) > pos = 0 > while True: > m = regex.search(text, pos) > if not m: > return > start, end = m.span() > for bad_pos in forbidden_offsets: > if start <= bad_pos < end: > break > else: > yield m > if overlap: > pos = start + 1 > else: > pos = end > 8<------- > > HTH, > John- Hide quoted text - > > - Show quoted text -
Here are two pyparsing-based routines, guardedSearch and guardedSearchByColumn. The first uses a pyparsing parse action to reject matches at a given string location, and returns a list of tuples containing the string location and matched text. The second uses an enhanced version of guardedSearch that uses the pyparsing built-ins col and lineno to filter matches by column instead of by raw string location, and returns a list of tuples of line and column of the match location, and the matching text. (Note that string locations are zero-based, while line and column numbers are 1-based.) -- Paul from pyparsing import Regex,ParseException,col,lineno def guardedSearch(pattern, text, forbidden_offsets): def offsetValidator(strng,locn,tokens): if locn in forbidden_offsets: raise ParseException, "can't match at offset %d" % locn regex = Regex(pattern).setParseAction(offsetValidator) return [ (tokStart,toks[0]) for toks,tokStart,tokEnd in regex.scanString(text) ] print guardedSearch(u"o", u"Hollo how are you", [4,]) def guardedSearchByColumn(pattern, text, forbidden_columns): def offsetValidator(strng,locn,tokens): if col(locn,strng) in forbidden_columns: raise ParseException, "can't match at offset %d" % locn regex = Regex(pattern).setParseAction(offsetValidator) return [ (lineno(tokStart,text),col(tokStart,text),toks[0]) for toks,tokStart,tokEnd in regex.scanString(text) ] text = """\ alksjdflasjf;sa a;sljflsjlaj ;asjflasfja;sf aslfj;asfj;dsf aslf;lajdf;ajsf aslfj;afsj;sd """ print guardedSearchByColumn(";", text, [1,6,11,]) Prints: [(1, 'o'), (7, 'o'), (15, 'o')] [(1, 13, ';'), (2, 2, ';'), (3, 12, ';'), (5, 5, ';')] -- http://mail.python.org/mailman/listinfo/python-list