On Fri, Nov 04, 2005 at 04:13:29PM +0100, Martijn van Oosterhout wrote: > On Fri, Nov 04, 2005 at 08:38:38AM -0500, [EMAIL PROTECTED] wrote: > > On Thu, Nov 03, 2005 at 09:17:43PM -0500, Tom Lane wrote: > > > Actually, the real reason we use UTF-8 and not any of the > > > sorta-fixed-size representations of Unicode is that the backend is by > > > and large an ASCII, null-terminated-string engine. *All* of the > > > supported backend encodings are ASCII-superset codes. Making > > > everything null-safe in order to allow use of UCS2 or UCS4 would be > > > a huge amount of work, and the benefit is at best questionable. > > Perhaps on a side note - my intuition (which sometimes lies) would tell > > me that, if the above is true, the backend is doing unnecessary copies > > of read-only data, if only, to insert a '\0' at the end of the strings. > > Is this true? > It's not quite that bad. Obviously for all on disk datatype zeros are > allowed. Bit strings, arrays, timestamps, numerics can all have > embedded nulls and they have a length header.
Are you and Tom conflicting in opinion? :-) I read "the backend is by and large an ASCII, null-terminated-string engine" with "we use UTF-8 [for varlena strings?]" as, a lot of the code assumes varlena strings are '\0' terminated, and an assumption on my part, that the varlena strings are not stored in the backend with a '\0' terminator, therefore, they require being copied out, terminated with a '\0', before they can be used? Or perhaps I'm just confused. :-) > > I'm thinking along the lines of the other threads that speak of PostgreSQL > > being CPU or I/O bound, not disk bound, for many sorts of operations. Is > > PostgreSQL unnecessary copying string data around (and other data, I would > > assume). > Well, there is a bit of copying around while creating tuples and such, > but it's not to add null terminators. How much effort (past discussions that I've missed from a decade ago? hehe) has been put into determining whether a zero-copy architecture, or really, a minimum copy architecture, would address some of these bottlenecks? Am I dreaming? :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly