On 2015-12-03 15:12, Adam Funk wrote:
I'm having trouble with some input files that are almost all proper
UTF-8 but with a couple of troublesome characters mixed in, which I'd
like to ignore instead of throwing ValueError.  I've found the
openhook for the encoding

for line in fileinput.input(options.files, 
openhook=fileinput.hook_encoded("utf-8")):
     do_stuff(line)

which the documentation describes as "a hook which opens each file
with codecs.open(), using the given encoding to read the file", but
I'd like codecs.open() to also have the errors='ignore' or
errors='replace' effect.  Is it possible to do this?

It looks like it's not possible with the standard "hook_encoded", but
you could write your own alternative:

import codecs

def my_hook_encoded(encoding, errors):

    def opener(path, mode):
        return codecs.open(path, mode, encoding=encoding, errors=errors)

    return opener

for line in fileinput.input(options.files, openhook=fileinput.my_hook_encoded("utf-8", "ignore")):
    do_stuff(line)

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to