On Tuesday, 22 May 2018 17:23:55 UTC-4, Peter J. Holzer wrote: > On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote: > > On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanada...@gmail.com wrote: > > > > > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro wrote: > > > > As Chris indicated, you'll have to figure out the correct encoding. You > > > > might want to check out the chardet module (available on PyPI, I > > > > believe) > > > > and see if it can come up with a better guess. I imagine there are other > > > > encoding guessers out there. That's just one I'm familiar with. > > > > > > thank you for the reply, but how exactly am i supposed to find oout what > > > is the correct encodeing?? > > > > One CAN NOT. > > > > The best you can do is to go ask the canonical source of the > > file what encoding the file is _supposed_ to be in. > > I disagree on both counts. > > 1) For any given file it is almost always possible to find the correct > encoding (or *a* correct encoding, as there may be more than one). > > This may require domain-specific knowledge (e.g. it may be necessary > to recognize the human language and know at least some distinctive > words, or to know some special symbols likely to be used in a data > file), and it almost always takes a bit of detective work and trial > and error. But I don't think I ever encountered a file where I > couldn't figure out the encoding. > > (If you have several files in the same encoding, it may not be > possible to figure out the encoding from a subset of them. For > example, the files may all be in ISO-8859-2, but the subset you have > contains only characters <= 0x7F. But if you have several files, they > may not all be the same encoding, either). > > 2) The canonical source of the file may not know. This is quite frequent > when the source is some non-technical person. Then you get answers > like "it's ASCII" (although the file contains umlauts, which aren't > in ASCII) or "it's ANSI" (which isn't an encoding, although Windows > pretends it is). Or they may not be aware that the file is converted > somewhere in the pipeline, to that the file they generated isn't > actually the file you received. So ask (or check the docs), but > verify! > > hp > > -- > _ | Peter J. Holzer | we build much bigger, better disasters now > |_|_) | | because we have much more sophisticated > | | | h...@hjp.at | management tools. > __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
hello peter ...how exactly would i solve this issue .....i have a script that works in python 2 but not pytho3..i did 2 to 3.py ...but i still get the errro...character undefieed..unicode decode error cant decode byte 1x09 in line 7414 from cp 1252..like would you have a sraright solution answer??..i cant get a straight answer..it was ported from ansi to python...so its utf-8 as far asi can see -- https://mail.python.org/mailman/listinfo/python-list