On Wed, May 27, 2015 at 3:57 PM, Laura Creighton <l...@openend.se> wrote: > ------- Forwarded Message > > Return-Path: <python-list-bounces+lac=openend...@python.org> > Received: from mail.python.org (mail.python.org [82.94.164.166]) > by theraft.openend.se (8.14.4/8.14.4/Debian-4) with ESMTP id > t4RC09ap02From: Chris Angelico <ros...@gmail.com> > Cc: "python-list@python.org" <python-list@python.org> > > > On Wed, May 27, 2015 at 9:52 PM, anatoly techtonik <techto...@gmail.com> > wrote: >> And the short answer is that we need unicode because we are printing this >> information to the stdout, and stdout is opened in text mode at least on >> Windows, and without explicit conversion, Python will try to decode stuff >> as being `ascii` and fail anyway. > > So you're working with text.
No. It is unknown. I am printing Nodes of SCons build graph and I don't know how Nodes are represented. In my case it appeared that Node contained Russian text, which led to crash of SCons. It could contain Russian text in cp1251 or in utf-8 or in KOI-8 and I can't do guessing of all possible encodings there. I just need to print that tree without crash or information loss. > That means you HAVE to decode it somehow; > you fundamentally cannot print bytes to the console. Lossless > concealment of arbitrary bytes won't help you. Won't help me with what? I am debugging build scripts to find out the *structure* of my dependencies and then all of the sudden Python crashes with UnicodeDecode error leaving me pronouncing bad Russian curses aloud. It is not even less forgiving than Java, but is also more treacherous, because of its run-time nature. It will surely help to preserve my zen if Python could just flow through the nodes of this graph. Garbage is okay - I can clean it up or remove if it stands in the way, just disrupt my flow or say me that now I want to deal with UnicodeDecode errors. Because I don't. > If you can't adequately > decode everything, either backslash-escape the rest, or use a > replacement character; you can't print out those bytes. Yes. How to backslash the rest in Python 2? In Python 3 there is some freaky "surrogateescape" error strategy, but what to do in Python 2? Replacement character is not a solution, because it is a data loss, and if I want to do post processing of graph log, I won't be able to recover the missing bits. > And no, I will not cc you. Subscribe to the list if you're going to > ask a question. Added Mailman to my suxx tracker: https://github.com/techtonik/suxx-tracker#mailman -- anatoly t. -- https://mail.python.org/mailman/listinfo/python-list