=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santama...@gmail.com> writes: > [ 0001-Allow-localized-month-names-to_date-v6.patch ]
I took a quick look through this. One thing I completely don't understand is why it's sufficient for seq_search_localized to do "initcap" semantics. Shouldn't it cover the all-upper and all-lower cases as well, as the existing seq_search code does? (That is, why is it okay to ignore the "type" argument?) On the other hand, you probably *should* be ignoring the "max" argument, because AFAICS the values that are passed for that are specific to the English ASCII spellings of the day and month names. It seems very likely that they could be too small for some sets of non-English names. Related to that, the business with + mb_max = max * pg_encoding_max_length(encoding); + name_len = strlen(name); + name_len = name_len < mb_max ? name_len : mb_max; seems wrong even if we assume that "max" is a sane limit on the number of characters to consider. This coding is likely to truncate a long "name" string in the middle of a multibyte character, leading to bad-encoding errors from later functions. I also am confused by the "if (mb_max > max ...)" bit. It looks to me like that's an obscure way of writing "if (pg_encoding_max_length() > 1)", which is a rather pointless check considering that the if() then goes on to insist on encoding == PG_UTF8. A bit further down, you have an "if (name_wlen > norm_wlen)" condition that seems pretty fishy. Is it really guaranteed that unicode_normalize_kc cannot transform the string without shortening it? I don't see that specified in its header comment, for sure. Also, it's purely accidental if this: norm_name = (char *) palloc((norm_wlen + 1) * sizeof(pg_wchar)); allocates a long enough string for the conversion back to multibyte form, because that output is not pg_wchars. I think you want something more like "norm_wlen * MAX_MULTIBYTE_CHAR_LEN + 1". (I wonder whether we need to be worrying about integer overflow in any of this.) It seems like you could eliminate a lot of messiness by extending localized_abbrev_days[] and related arrays by one element that's always NULL. That would waste, hmm, four 8-byte pointers on typical machines --- but you're eating a lot more than 32 bytes of code to pass around the array lengths, plus it's really ugly that the plain and localized arrays are handled differently. I don't think the way you're managing the "localized_names" variable in DCH_from_char is very sensible. The reader has to do a lot of reverse engineering just to discover that the value is constant NULL in most references, something that you've not helped by declaring it outside the loop rather than inside. I think you could drop the variable and write constant NULL in most places, with something like S_TM(n->suffix) ? localized_full_days : NULL in those from_char_seq_search calls where it's relevant. In general I'm displeased with the lack of attention to comments. Notably, you added arguments to from_char_seq_search without updating its header comment ... not that that comment is a great API spec, but at the least you shouldn't be making it worse. I think the bug I complained of above is directly traceable to the lack of a clear spec here for what "max" means, so failure to keep these comments up to speed does have demonstrable bad consequences. regards, tom lane