On Wed, Jun 29, 2016 at 1:52 AM, Random832 wrote:
> On Tue, Jun 28, 2016, at 06:25, Chris Angelico wrote:
>> For the OP's situation, frankly, I doubt there'll be anything other
>> than UTF-8, Latin-1, and CP-1252. The chances that someone casually
>> mixes CP-1252 with (say) CP-1254 would be vanis
On Tue, Jun 28, 2016, at 06:25, Chris Angelico wrote:
> For the OP's situation, frankly, I doubt there'll be anything other
> than UTF-8, Latin-1, and CP-1252. The chances that someone casually
> mixes CP-1252 with (say) CP-1254 would be vanishingly small. So the
> simple decode of "UTF-8, or faili
On Tue, 28 Jun 2016 10:30 pm, Michael Welle wrote:
> I changed the code from my initial mail to:
>
> LOGGER = logging.getLogger()
> LOGGER.addHandler(logging.FileHandler("tmp.txt", encoding="utf-8"))
>
> for l in sys.stdin.buffer:
> l = l.decode('utf-8')
> LOGGER.critical(l)
I imagine y
On Tue, Jun 28, 2016, at 10:52, Steven D'Aprano wrote:
> "you will find THREE OR FOUR different encodings in one email.
> I think at the sending side they just glue different text
> fragments from different sources together without thinking
> about the encoding"
>
> But I'm not
On Tue, 28 Jun 2016 10:30 pm, Michael Welle wrote:
> I look at the hex values of the bytes, get the win-1252 table and
> translate the bytes to chars. If the result makes sense, it's win-1252
> (and maybe others, if the tables overlap). So in that sense I know what
> I have. I least for this exper
Michael Welle wrote:
> With your help, I fixed logging. Somehow I had in mind that the
> logging module would do the right thing if I don't specify the encoding.
The default encoding depends on the environment (and platform):
$ touch tmp.txt
$ python3 -c 'print(open("tmp.txt").encoding)'
UTF-8
$
On Tue, 28 Jun 2016 08:17 pm, Michael Welle wrote:
> After a bit more 'fiddling' I found out that all test cases work if I
> use .decode('utf-8') on the incoming bytes. In my first approach I tried
> to find out at what I was looking and then used a specific .decode, e.g.
> .decode('windows-1252')
On Tue, Jun 28, 2016 at 8:37 PM, Michael Welle wrote:
> Steven D'Aprano writes:
>
>> On Tue, 28 Jun 2016 06:35 pm, Michael Welle wrote:
>>
>>> my original data is email. The mail header says it's utf-8, but you will
>>> find three or four different encodings in one email. I think at the
>>> sendi
On Tue, 28 Jun 2016 06:35 pm, Michael Welle wrote:
> my original data is email. The mail header says it's utf-8, but you will
> find three or four different encodings in one email. I think at the
> sending side they just glue different text fragments from different
> sources together without think
On Tue, Jun 28, 2016 at 6:30 PM, Peter Otten <__pete...@web.de> wrote:
> Does chardet ever return an encoding that fails to decode
> the line? Only in that case the "ignore" error handler would make sense.
Assuming the module the OP is using is functionally identical to the
one I use from the comm
Michael Welle wrote:
> Hello,
>
> I want to use Python 3 to process data, that unfortunately come with
> different encodings. So far I have found ascii, iso-8859, utf-8,
> windows-1252 and maybe some more in the same file (don't ask...). I read
> the data via sys.stdin and the idea is to read a l
On Tue, Jun 28, 2016 at 5:25 PM, Michael Welle wrote:
> I want to use Python 3 to process data, that unfortunately come with
> different encodings. So far I have found ascii, iso-8859, utf-8,
> windows-1252 and maybe some more in the same file (don't ask...). I read
> the data via sys.stdin and th
12 matches
Mail list logo