On 4 October 2011 22:57, Alex Hunsaker <bada...@gmail.com> wrote: > On Tue, Oct 4, 2011 at 03:09, Amit Khandekar > <amit.khande...@enterprisedb.com> wrote: >> On 4 October 2011 14:04, Alex Hunsaker <bada...@gmail.com> wrote: >>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar >>> <amit.khande...@enterprisedb.com> wrote: >>> >>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to >>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...] >>> >>> Consider a latin1 database where utf8_str was a string of ascii >>> characters. [...] > >>> [Patch] Look ok to you? >>> >> >> + if(GetDatabaseEncoding() == PG_UTF8) >> + pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false); >> >> In your patch, the above will again skip mb-validation if the database >> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns >> the un-converted string even if *one* of the src and dest encodings is >> SQL_ASCII. > > *scratches head* I thought the point of SQL_ASCII was no encoding > conversion was done and so there would be nothing to verify. > > Ahh I see looks like pg_verify_mbstr_len() will make sure there are no > NULL bytes in the string when we are a single byte encoding. > >> I think : >> if (ret == utf8_str) >> + { >> + pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false); >> ret = pstrdup(ret); >> + } >> >> This (ret == utf8_str) condition would be a reliable way for knowing >> whether pg_do_encoding_conversion() has done the conversion at all. > > Yes. However (and maybe im nitpicking here), I dont see any reason to > verify certain strings twice if we can avoid it. > > What do you think about: > + /* > + * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion() > + * will not do any conversion or verification. we need to do it > manually instead. > + */ > + if( GetDatabaseEncoding() == PG_UTF8 || > GetDatabaseEncoding() == SQL_ASCII) > + pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false); >
You mean the final changes in plperl_helpers.h would look like something like this right? : static inline char * utf_u2e(const char *utf8_str, size_t len) { char *ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding()); if (ret == utf8_str) + { + if (GetDatabaseEncoding() == PG_UTF8 || + GetDatabaseEncoding() == PG_SQL_ASCII) + { + pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false); + } + ret = pstrdup(ret); + } return ret; } Yeah I am ok with that. It's just an additional check besides (ret == utf8_str) to know if we really require validation. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers