gry wrote: > [Python 2.7] > I have a body of text (~1MB) that I need to modify. I need to look > for matches of a regular expression and replace a random selection of > those matches with a new string. There may be several matches on any > line, and a random selection of them should be replaced. The > probability of replacement should be adjustable. Performance is not > an issue. E.g: if I have: > > SELECT max(PUBLIC.TT.I) AS SEL_0 FROM (SCHM.T RIGHT OUTER JOIN > PUBLIC.TT ON (SCHM.T.I IS NULL)) WHERE (NOT(NOT((power(PUBLIC.TT.F, > PUBLIC.TT.F) = cast(ceil(( SELECT 22 AS SEL_0 FROM > (PUBLIC.TT AS PUBLIC_TT_0 JOIN PUBLIC.TT AS PUBLIC_TT_1 ON (ceil(0.46) > =sin(PUBLIC_TT_1.F))) WHERE ((zeroifnull(PUBLIC_TT_0.I) = > sqrt((0.02 + PUBLIC_TT_1.F))) OR > > I might want to replace '(max|min|cos|sqrt|ceil' with "public.\1", but > only with probability 0.7. I looked and looked for some computed > thing in re's that I could stick and expression, but could not find > such(for good reasons, I know). > Any ideas how to do this? I would go for simple, even if it's wildly > inefficient, though elegance is always admired...
def make_sub(text, probability): def sub(match): if random.random() < probability: return text + match.group(1) return match.group(1) return sub print re.compile("(max|min|cos|sqrt|ceil)").sub(make_sub(r"public.", .7), sample) or even def make_sub(text, probability): def sub(match): if random.random() < probability: def group_sub(m): return match.group(int(m.group(1))) return re.compile(r"[\\](\d+)").sub(group_sub, text) return match.group(0) return sub print re.compile("(max|min|cos|sqrt|ceil)").sub(make_sub(r"public.\1", .7), sample) -- http://mail.python.org/mailman/listinfo/python-list