subject:"Extended ASCII"

Re: What extended ASCII character set uses 0x9D?

2017-08-22 Thread Chris Angelico

On Tue, Aug 22, 2017 at 5:15 PM, Gregory Ewing wrote: > Chris Angelico wrote: >> >> a naive ASCII upper-casing wouldn't produce 0x81 either - if it did, it >> would also convert 0x21 ("!") into 0x01 (SOH, a control character). So >> this one's still a mystery. > > > It's unlikely that even a naive

Re: What extended ASCII character set uses 0x9D?

2017-08-22 Thread Gregory Ewing

Chris Angelico wrote: a naive ASCII upper-casing wouldn't produce 0x81 either - if it did, it would also convert 0x21 ("!") into 0x01 (SOH, a control character). So this one's still a mystery. It's unlikely that even a naive ascii upper/lower casing algorithm would be *that* naive; it would hav

Re: What extended ASCII character set uses 0x9D?

2017-08-19 Thread Gregory Ewing

Ian Kelly wrote: One possibility is that it's the same two bytes. That would make it 0xE2 0x80 0x9D which is "right double quotation mark". Since it keeps appearing after ending double quotes that seems plausible, although one has to wonder why it appears *in addition to* the ASCII double quotes.

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread John Nagle

On 08/17/2017 05:53 PM, Chris Angelico wrote: On Fri, Aug 18, 2017 at 10:30 AM, John Nagle wrote: On 08/17/2017 05:14 PM, John Nagle wrote: I'm cleaning up some data which has text description fields from multiple sources. A few more cases: bytearray(b'\xe5\x81ukasz zmywaczyk') This

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Piet van Oostrum

Marko Rauhamaa writes: > Chris Angelico : > >> Ohh. We have no evidence that uppercasing is going on here, and a >> naive ASCII upper-casing wouldn't produce 0x81 either - if it did, it >> would also convert 0x21 ("!") into 0x01 (SOH, a control character). So >> this one's still a mystery. > > BT

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Random832

On Fri, Aug 18, 2017, at 03:39, Marko Rauhamaa wrote: > BTW, I was reading up on the history of ASCII control characters. Quite > fascinating. > > For example, have you ever wondered why DEL is the odd control character > out at the code point 127? The reason turns out to be paper punch tape. > By

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread MRAB

On 2017-08-18 04:46, John Nagle wrote: On 08/17/2017 05:53 PM, Chris Angelico wrote:> On Fri, Aug 18, 2017 at 10:30 AM, John Nagle wrote: >> On 08/17/2017 05:14 PM, John Nagle wrote: >>> I'm cleaning up some data which has text description fields from >>> multiple sources. >> A few

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Chris Angelico

On Fri, Aug 18, 2017 at 5:39 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> Ohh. We have no evidence that uppercasing is going on here, and a >> naive ASCII upper-casing wouldn't produce 0x81 either - if it did, it >> would also convert 0x21 ("!") into 0x01 (SOH, a control character). So >> thi

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Marko Rauhamaa

Chris Angelico : > Ohh. We have no evidence that uppercasing is going on here, and a > naive ASCII upper-casing wouldn't produce 0x81 either - if it did, it > would also convert 0x21 ("!") into 0x01 (SOH, a control character). So > this one's still a mystery. BTW, I was reading up on the history

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Chris Angelico

On Fri, Aug 18, 2017 at 5:11 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> On Fri, Aug 18, 2017 at 4:57 PM, Marko Rauhamaa wrote: >>> Chris Angelico : >>> On Fri, Aug 18, 2017 at 4:38 PM, Paul Rubin wrote: > John Nagle writes: >> Since, as someone pointed out, there was U

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Marko Rauhamaa

Chris Angelico : > On Fri, Aug 18, 2017 at 4:57 PM, Marko Rauhamaa wrote: >> Chris Angelico : >> >>> On Fri, Aug 18, 2017 at 4:38 PM, Paul Rubin wrote: John Nagle writes: > Since, as someone pointed out, there was UTF-8 which had been > run through an ASCII-type lower casing algori

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Chris Angelico

On Fri, Aug 18, 2017 at 4:57 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> On Fri, Aug 18, 2017 at 4:38 PM, Paul Rubin wrote: >>> John Nagle writes: Since, as someone pointed out, there was UTF-8 which had been run through an ASCII-type lower casing algorithm >>> >>> I spent a few

Re: What extended ASCII character set uses 0x9D?

2017-08-18 Thread Marko Rauhamaa

Chris Angelico : > On Fri, Aug 18, 2017 at 4:38 PM, Paul Rubin wrote: >> John Nagle writes: >>> Since, as someone pointed out, there was UTF-8 which had been >>> run through an ASCII-type lower casing algorithm >> >> I spent a few minutes figuring out if some of the mysterious 0x81's >> could be

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Chris Angelico

On Fri, Aug 18, 2017 at 4:38 PM, Paul Rubin wrote: > John Nagle writes: >> Since, as someone pointed out, there was UTF-8 which had been >> run through an ASCII-type lower casing algorithm > > I spent a few minutes figuring out if some of the mysterious 0x81's > could be from ASCII-lower-casing s

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Paul Rubin

John Nagle writes: > Since, as someone pointed out, there was UTF-8 which had been > run through an ASCII-type lower casing algorithm I spent a few minutes figuring out if some of the mysterious 0x81's could be from ASCII-lower-casing some Unicode combining characters, but the numbers didn't seem

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Chris Angelico

On Fri, Aug 18, 2017 at 4:24 PM, John Nagle wrote: >I'm coming around to the idea that some of these snippets > have been previously mis-converted, which is why they make no sense. > Since, as someone pointed out, there was UTF-8 which had been > run through an ASCII-type lower casing algorith

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread John Nagle

On 08/17/2017 10:12 PM, Ian Kelly wrote: Here's some more 0x9d usage, each from a different data item: Guitar Pro, JamPlay, RedBana\\\'s Audition,\x9d Doppleganger\x99s The Lounge\x9d or Heatwave Interactive\x99s Platinum Life Country,\\" This one seems like a good hint since \x99 here looks

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Steve D'Aprano

;)) 'LATIN SMALL LETTER U WITH GRAVE' Doesn't seem too likely. This may help: http://i18nqa.com/debug/bug-double-conversion.html There's always the possibility that it's just junk, or moji-bake from some other source, so it might not be anything sensible in any extended ASCII

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Ian Kelly

On Thu, Aug 17, 2017 at 9:46 PM, John Nagle wrote: >The 0x9d thing seems unrelated to the Polish names thing. 0x9d > shows up in the middle of English text that's otherwise ASCII. > Is this something that can appear as a result of cutting and > pasting from Microsoft Word? > >I'd like to

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread John Nagle

On 08/17/2017 05:53 PM, Chris Angelico wrote:> On Fri, Aug 18, 2017 at 10:30 AM, John Nagle wrote: >> On 08/17/2017 05:14 PM, John Nagle wrote: >>> I'm cleaning up some data which has text description fields from >>> multiple sources. >> A few more cases: >> >> bytearray(b'\xe5\x81ukasz zm

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Ian Kelly

On Thu, Aug 17, 2017 at 8:15 PM, MRAB wrote: > On 2017-08-18 01:53, Chris Angelico wrote: >> So here's an insane theory: something attempted to lower-case the byte >> stream as if it were ASCII. If you ignore the high bit, 0xC5 looks >> like 0x45 or "E", which lower-cases by having 32 added to it,

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread MRAB

On 2017-08-18 01:30, John Nagle wrote: On 08/17/2017 05:14 PM, John Nagle wrote: > I'm cleaning up some data which has text description fields from > multiple sources. A few more cases: bytearray(b'miguel \xe3\x81ngel santos') bytearray(b'lidija kmeti\xe4\x8d') bytearray(b'\xe5\x81ukasz

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread MRAB

On 2017-08-18 01:53, Chris Angelico wrote: On Fri, Aug 18, 2017 at 10:30 AM, John Nagle wrote: On 08/17/2017 05:14 PM, John Nagle wrote: I'm cleaning up some data which has text description fields from multiple sources. A few more cases: bytearray(b'\xe5\x81ukasz zmywaczyk') This one

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread MRAB

On 2017-08-18 01:14, John Nagle wrote: I'm cleaning up some data which has text description fields from multiple sources. Some are are in UTF-8. Some are in WINDOWS-1252. And some are in some other character set. So I have to examine and sanity check each field in a database dump, deciding

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Ben Bacarisse

John Nagle writes: > I'm cleaning up some data which has text description fields from > multiple sources. Some are are in UTF-8. Some are in WINDOWS-1252. > And some are in some other character set. So I have to examine and > sanity check each field in a database dump, deciding which characte

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Ian Kelly

On Thu, Aug 17, 2017 at 6:53 PM, Chris Angelico wrote: > That doesn't work for everything, though. The 0x81 0x81 and 0x9d ones > are still a puzzle. I'm fairly sure that b'M\x81\x81\xfcnster' is 'Münster'. It decodes to that in Latin-1 if you remove the \x81 bytes. The question then is what those

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Chris Angelico

On Fri, Aug 18, 2017 at 10:54 AM, Ian Kelly wrote: > On Thu, Aug 17, 2017 at 6:52 PM, Ian Kelly wrote: >> On Thu, Aug 17, 2017 at 6:30 PM, John Nagle wrote: >>> A few more cases: >>> >>> bytearray(b'miguel \xe3\x81ngel santos') >> >> If that were b'\xc3\x81' it would be Á in UTF-8 which would fi

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Ian Kelly

On Thu, Aug 17, 2017 at 6:52 PM, Ian Kelly wrote: > On Thu, Aug 17, 2017 at 6:30 PM, John Nagle wrote: >> A few more cases: >> >> bytearray(b'miguel \xe3\x81ngel santos') > > If that were b'\xc3\x81' it would be Á in UTF-8 which would fit the > rest of the name. > >> bytearray(b'\xe5\x81ukasz zmy

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Ian Kelly

On Thu, Aug 17, 2017 at 6:30 PM, John Nagle wrote: > A few more cases: > > bytearray(b'miguel \xe3\x81ngel santos') If that were b'\xc3\x81' it would be Á in UTF-8 which would fit the rest of the name. > bytearray(b'\xe5\x81ukasz zmywaczyk') If that were b'\xc5\x81' it would be Ł in UTF-8 which

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Chris Angelico

On Fri, Aug 18, 2017 at 10:30 AM, John Nagle wrote: > On 08/17/2017 05:14 PM, John Nagle wrote: >> I'm cleaning up some data which has text description fields from >> multiple sources. > A few more cases: > > bytearray(b'\xe5\x81ukasz zmywaczyk') This one has to be Polish, and the first char

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Ian Kelly

On Thu, Aug 17, 2017 at 6:27 PM, Chris Angelico wrote: > On Fri, Aug 18, 2017 at 10:14 AM, John Nagle wrote: >> I'm cleaning up some data which has text description fields from >> multiple sources. Some are are in UTF-8. Some are in WINDOWS-1252. >> And some are in some other character set. S

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread John Nagle

On 08/17/2017 05:14 PM, John Nagle wrote: > I'm cleaning up some data which has text description fields from > multiple sources. A few more cases: bytearray(b'miguel \xe3\x81ngel santos') bytearray(b'lidija kmeti\xe4\x8d') bytearray(b'\xe5\x81ukasz zmywaczyk') bytearray(b'M\x81\x81\xfcnster'

Re: What extended ASCII character set uses 0x9D?

2017-08-17 Thread Chris Angelico

On Fri, Aug 18, 2017 at 10:14 AM, John Nagle wrote: > I'm cleaning up some data which has text description fields from > multiple sources. Some are are in UTF-8. Some are in WINDOWS-1252. > And some are in some other character set. So I have to examine and > sanity check each field in a databa

What extended ASCII character set uses 0x9D?

2017-08-17 Thread John Nagle

I'm cleaning up some data which has text description fields from multiple sources. Some are are in UTF-8. Some are in WINDOWS-1252. And some are in some other character set. So I have to examine and sanity check each field in a database dump, deciding which character set best represents what's

Re: Extended ASCII [solved]

2017-01-13 Thread D'Arcy Cain

On 2017-01-13 05:44 PM, Grant Edwards wrote: On 2017-01-13, D'Arcy Cain wrote: Here is the failing code: with open(sys.argv[1], encoding="latin-1") as fp: for ln in fp: print(ln) Traceback (most recent call last): File "./load_iff", line 11, in print(ln) UnicodeEncodeError:

Re: Extended ASCII

2017-01-13 Thread Jon Ribbens

On 2017-01-13, D'Arcy Cain wrote: > I thought I was done with this crap once I moved to 3.x but some > Winblows machines are still sending what some circles call "Extended > ASCII". I have a file that I am trying to read and it is barfing on > some characters. For

Re: Extended ASCII

2017-01-13 Thread Grant Edwards

On 2017-01-13, D'Arcy Cain wrote: > Here is the failing code: > > with open(sys.argv[1], encoding="latin-1") as fp: >for ln in fp: > print(ln) > > Traceback (most recent call last): >File "./load_iff", line 11, in > print(ln) > UnicodeEncodeError: 'ascii' codec can't encode cha

Re: Extended ASCII

2017-01-13 Thread Random832

On Fri, Jan 13, 2017, at 17:24, D'Arcy Cain wrote: > I thought I was done with this crap once I moved to 3.x but some > Winblows machines are still sending what some circles call "Extended > ASCII". I have a file that I am trying to read and it is barfing on > so

Extended ASCII

2017-01-13 Thread D'Arcy Cain

I thought I was done with this crap once I moved to 3.x but some Winblows machines are still sending what some circles call "Extended ASCII". I have a file that I am trying to read and it is barfing on some characters. For example: due to the Qu\xe9bec government Obviously shou

Extended ASCII [was Re: for / while else doesn't make sense]

2016-05-25 Thread Steven D'Aprano

even says "There are *several* different variations of the 8-bit ASCII table." (emphasis added), which is an understatement and a half. Wikipedia claims over 220 different "extended ASCII" encodings: https://en.wikipedia.org/wiki/Extended_ASCII That's more than the nu

40 matches

Mail list logo