Jukka Aho wrote: > When converting Unicode strings to legacy character encodings, it is > possible to register a custom error handler that will catch and process > all code points that do not have a direct equivalent in the target > encoding (as described in PEP 293). > > The thing to note here is that the error handler itself is required to > return the substitutions as Unicode strings - not as the target encoding > bytestrings. Some lower-level gadgetry will silently convert these > strings to the target encoding. > > That is, if the substitution _itself_ doesn't contain illegal code > points for the target encoding. > > Which brings us to the point: if my error handler for some reason > returns illegal substitutions (from the viewpoint of the target > encoding), how can I catch _these_ errors and make things good again? > > I thought it would work automatically, by calling the error handler as > many times as necessary, and letting it work out the situation, but it > apparently doesn't. Sample code follows: > > > # So the question becomes: how can I make this work > # in a graceful manner? >
change the return statement with this code: return (substitution.encode(error.encoding,"practical").decode( error.encoding), error.start+1) -- Serge -- http://mail.python.org/mailman/listinfo/python-list