Maciek Sakrejda <m.sakre...@gmail.com> writes: > Tested again, and the patch looks good. It does not accept leading or > trailing whitespace, which seems reasonable, given the unclear behavior of > to_number with other format strings. It also rejects less common Roman > spellings like "IIII". I don't feel strongly about that one way or the other, > but perhaps a test codifying that behavior would be useful to make it clear > it's intentional.
Yeah, I don't have a strong feeling about that either, but probably being strict is better. to_number has a big problem with "garbage in garbage out" already, and being lax will make that worse. A few notes from a quick read of the patch: * roman_to_int() should have a header comment, notably explaining its result convention. I find it fairly surprising that "0" means an error --- I realize that Roman notation can't represent zero, but wouldn't it be better to use -1? * Do *not* rely on toupper(). There are multiple reasons why not, but in this context the worst one is that in Turkish locales it's likely to translate "i" to "İ", on which you will fail. I'd use pg_ascii_toupper(). * I think roman_to_int() is under-commented internally too. To start with, where did the magic "15" come from? And why should we have such a test anyway --- what if the format allows for trailing stuff after the roman numeral? (That is, I think you need to fix this so that roman_to_int reports how much data it ate, instead of assuming that it must read to the end of the input.) The logic for detecting invalid numeral combinations feels a little opaque too. Do you have a source for the rules you're implementing, and if so could you cite it? * This code will get run through pgindent before commit, so you might want to revisit whether your comments will still look nice afterwards. There's not a lot of consistency in them about initial cap versus lower case or trailing period versus not, too. * roman_result could be declared where used, no? * I'm not really convinced that we need a new errcode ERRCODE_INVALID_ROMAN_NUMERAL rather than using a generic one like ERRCODE_INVALID_TEXT_REPRESENTATION. However, if we do do it like that, why did you pick the out-of-sequence value 22P07? * Might be good to have a few test cases demonstrating what happens when there's other stuff combined with the RN format spec. Notably, even if RN itself won't eat whitespace, there had better be a way to write the format string to allow that. * Further to Aleksander's point about lack of test coverage for the to_char direction, I see from https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html that almost none of the existing roman-number code paths are covered today. While it's not strictly within the charter of this patch to improve that, maybe we should try to do so --- at the very least it'd raise formatting.c's coverage score a few notches. regards, tom lane