Andrew Dunstan <[EMAIL PROTECTED]> writes: > Perhaps we're talking at cross purposes.
> The problem with doing encoding validation in scan.l is that it lacks > context. Null bytes are only the tip of the bytea iceberg, since any > arbitrary sequence of bytes can be valid for a bytea. If you think that, then we're definitely talking at cross purposes. I assert we should require the post-scanning value of a string literal to be valid in the database encoding. If you want to produce an arbitrary byte sequence within a bytea value, the way to get there is for the bytea input function to do the de-escaping, not for the string literal parser to do it. The current situation where there is overlapping functionality is a bit unfortunate, but once standard_conforming_strings is on by default, it'll get a lot easier to work with. I'm not eager to contort APIs throughout the backend in order to produce a more usable solution for the standard_conforming_strings = off case, given the expected limited lifespan of that usage. The only reason I was considering not doing it in scan.l is that scan.l's behavior ideally shouldn't depend on any changeable variables. But until there's some prospect of database_encoding actually being mutable at run time, there's not much point in investing a lot of sweat on that either. > I still don't see why it's OK for us to do validation from the foo_recv > functions but not the corresponding foo_in functions. Basically, a CSTRING handed to an input function should already have been encoding-validated by somebody. The most obvious reason why this must be so is the embedded-null problem, but in general it will already have been validated (typically as part of a larger string such as the whole SQL query or whole COPY data line), and doing that over is pointless and expensive. On the other hand, the entire point of a recv function is that it gets raw data that no one else in the backend knows the format of; so if the data is to be considered textual, the recv function has to be the one that considers it so and invokes appropriate conversion or validation. The reason backslash escapes in string literals are a problem is that they can produce incorrect-encoding results from what had been a validated string. > At least in the > short term that would provide us with fairly complete protection against > accepting invalidly encoded data into the database, once we fix up > chr(), without having to mess with the scanner, parser, COPY code etc. Instead, we have to mess with an unknown number of UDTs ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq