"William ZHANG" <[EMAIL PROTECTED]> writes: > Sorry. I still cannot understand why backend encodings must have this > property. AFAIK, the parser treats characters as ASCII. So any multi-byte > characters will be treated as two or more ASCII characters. But if > the multi-byte encoding doesnot use any special ASCII characters like > single quote('), double quote(") and backslash(\), I think the parser > can deal with it correctly.
You've got your attention too narrowly focused on strings inside quotes; it's strings outside quotes that are the problem. As an example, I see that gb18030 defines characters like 97 7e. If someone tried to use that as a character of a SQL identifier --- something that'd work fine for the UTF8 equivalent e6 a2 a1 --- the parser would see it as an identifier byte followed by the operator ~. Similarly, there are problems if we were to allow these character sets for the pattern argument of a regular expression operator, or for any datatype at all that can be embedded in an array constant. And for PL languages that feed prosrc strings into external interpreters, such as Perl or R, it gets really interesting really quickly :-(. It is possible that some of these encodings could be allowed without any risks, but I don't think it is worth our time to grovel through each valid character and every possible backend situation to determine safety. The risks are not always obvious --- see for instance the security holes we fixed about a year ago in 8.1.4 et al --- and so I for one would never have a lot of faith in there not being any holes. The rule "no ASCII-aliasing characters" is a simple one that we can have some confidence in. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match