On 08/09/2011 20:51, gry wrote:
[Python 2.7]
I have a body of text (~1MB) that I need to modify.   I need to look
for matches of a regular expression and replace a random selection of
those matches with a new string.  There may be several matches on any
line, and a random selection of them should be replaced.  The
probability of replacement should be adjustable.  Performance is not
an issue.  E.g: if I have:

SELECT max(PUBLIC.TT.I) AS SEL_0 FROM (SCHM.T RIGHT OUTER JOIN
PUBLIC.TT ON (SCHM.T.I IS NULL)) WHERE (NOT(NOT((power(PUBLIC.TT.F,
PUBLIC.TT.F) = cast(ceil((         SELECT 22 AS SEL_0        FROM
(PUBLIC.TT AS PUBLIC_TT_0 JOIN PUBLIC.TT AS PUBLIC_TT_1 ON (ceil(0.46)
=sin(PUBLIC_TT_1.F)))        WHERE ((zeroifnull(PUBLIC_TT_0.I) =
sqrt((0.02 + PUBLIC_TT_1.F))) OR

I might want to replace '(max|min|cos|sqrt|ceil' with "public.\1", but
only with probability 0.7.  I looked and looked for some computed
thing in re's that I could stick and expression, but could not find
such(for good reasons, I know).
Any ideas how to do this?  I would go for simple, even if it's wildly
inefficient, though elegance is always admired...

re.sub can accept a function as the replacement. It'll call the
function when it finds a match, and the string returned by that
function will be the replacement.

You could write a function which returns either the original substring
which was found or a different substring.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to