Re: "convert" string to bytes without changing data (encoding)

Ethan Furman Wed, 28 Mar 2012 11:59:18 -0700

Peter Daum wrote:

On 2012-03-28 12:42, Heiko Wundram wrote:

Am 28.03.2012 11:43, schrieb Peter Daum:

... in my example, the variable s points to a "string", i.e. a series of
bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters.

No; a string contains a series of codepoints from the unicode plane,
representing natural language characters (at least in the simplistic
view, I'm not talking about surrogates). These can be encoded to
different binary storage representations, of which ascii is (a common) one.

What I am looking for is a general way to just copy the raw data
from a "string" object to a "byte" object without any attempt to
"decode" or "encode" anything ...

There is "logically" no raw data in the string, just a series of
codepoints, as stated above. You'll have to specify the encoding to use
to get at "raw" data, and from what I gather you're interested in the
latin-1 (or iso-8859-15) encoding, as you're specifically referencing
chars >= 0x80 (which hints at your mindset being in LATIN-land, so to
speak).


The longer story of my question is: I am new to python (obviously), and
since I am not familiar with either one, I thought it would be advisory
to go for python 3.x. The biggest problem that I am facing is, that I
am often dealing with data, that is basically text, but it can contain
8-bit bytes. In this case, I can not safely assume any given encoding,
but I actually also don't need to know - for my purposes, it would be
perfectly good enough to deal with the ascii portions and keep anything
else unchanged.

Where is the data coming from? Files? In that case, it sounds like youwill want to decode/encode using 'latin-1', as the bulk of your text isplain ascii and you don't really care about the upper-ascii chars.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list

Re: "convert" string to bytes without changing data (encoding)

Reply via email to