Re: Reading \n unescaped from a file

Friedrich Rentsch Sun, 06 Sep 2015 05:48:09 -0700


On 09/06/2015 09:51 AM, Peter Otten wrote:

Friedrich Rentsch wrote:

My response was meant for the list, but went to Peter by mistake. So I
repeat it with some delay:

On 09/03/2015 04:24 PM, Peter Otten wrote:

Friedrich Rentsch wrote:

On 09/03/2015 11:24 AM, Peter Otten wrote:

Friedrich Rentsch wrote:

I appreciate your identifying two mistakes. I am curious to know what
they are.

Sorry for not being explicit.

                substitutes = [self.table [item] for item in hits if
                item
in valid_hits] + []  # Make lengths equal for zip to work right

That looks wrong...

You are adding an empty list here. I wondered what you were trying to
achieve with that.

Right you are! It doesn't do anything. I remember my idea was to pad the
substitutes list by one, because the list of intervening text segments
is longer by one element and zip uses the least common length,
discarding all overhang. The remedy was totally ineffective and, what's
more, not needed, judging by the way the editor performs as expected.

That's because you are getting the same effect later by adding

nohits[-1]

You could avoid that by replacing [] with [""].

substitutes = list("12")
nohits = list("abc")
zipped = zip(nohits, substitutes)
"".join(list(reduce(lambda a, b: a+b, [zipped][0]))) + nohits[-1]

'a1b2c'

zipped = zip(nohits, substitutes + [""])
"".join(list(reduce(lambda a, b: a+b, [zipped][0])))

'a1b2c'

By the way, even those who are into functional programming might find

"".join(map("".join, zipped))

'a1b2c'

more readable.

But there's a more general change that I suggest: instead of processing the
string twice, first to search for matches, then for the surrounding text you
could achieve the same in one pass with a cool feature of the re.sub()
method -- it accepts a function:

def replace(text, replacements):

...     table = dict(replacements)
...     def substitute(match):
...         return table[match.group()]
...     regex = "|".join(re.escape(find) for find, replace in replacements)
...     return re.compile(regex).sub(substitute, text)
...

replace("1 foo 2 bar 1 baz", [("1", "one"), ("2", "two")])

'one foo two bar one baz'

I didn't think of using sub. But you're right. It is better, likelyfaster too. Building the regex reversed sorted will make it handleoverlapping targets correctly, e.g.:


r = (
    ("1", "one"),
    ("2", "two"),
    ("12", "twelve"),
)

Your function as posted:

replace ('1 foo 2 bar 12 baz', r)
'one foo two bar onetwo baz'

regex = "|".join(re.escape(find) for find, replace in reversed (sorted 
(replacements)))


replace ('1 foo 2 bar 12 baz', r)
'one foo two bar twelve baz'

Thanks for the hints

Frederic

--
https://mail.python.org/mailman/listinfo/python-list

Re: Reading \n unescaped from a file

Reply via email to