Further, does anything, except a printing device need to know the
encoding of a piece of "text"?

I may be wrong, but I believe that's part of the idea between separation of string and bytes types in Python 3.x. I believe, if you are using Python 3.x, you don't need the character encoding mumbo jumbo at all ;-)

If you're using Python 2.x, though, I believe if you simply set the file opening mode to binary then data you read() should still be treated as an array of bytes, although you may encounter issues trying to access the n'th character.

Please do correct me if I'm wrong, anyone.

On Thu, 27 Aug 2009 09:39:06 -0700, Shashank Singh <shashank.sunny.si...@gmail.com> wrote:

Hi All!

I have a very simple (and probably stupid) question eluding me.
When exactly is the char-set information needed?

To make my question clear consider reading a file.
While reading a file, all I get is basically an array of bytes.

Now suppose a file has 10 bytes in it (all is data, no metadata,
forget the BOM and stuff for a little while). I read it into an array of 10
bytes, replace, say, 2nd bytes and write all the bytes back to a new
file.

Do i need the character encoding mumbo jumbo anywhere in this?

Further, does anything, except a printing device need to know the
encoding of a piece of "text"? I mean, as long as we are not trying
to get a symbolic representation of a "text" or get "i"th character
of it, all we need to do is to carry the intended encoding as
an auxiliary information to the data stored as byte array.

Right?

--shashank



--
Rami Chowdhury
"Never attribute to malice that which can be attributed to stupidity" -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to