Hi Tom, On Sat, Jan 18, 2025 at 5:27 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Hunaid Sohail <hunaidp...@gmail.com> writes: > > I’ve attached a new patch that addresses comments 2, 3, and 4 from Tomas. > > I'm still quite unhappy that this doesn't tolerate trailing > whitespace. That's not how other format patterns work, and it makes > it impossible to round-trip the output of to_char(..., 'RN'). > I think the core of the problem is that you tried to cram the logic > in where the existing "not implemented" error is thrown. But on > inspection there is nothing sane about that entire "Roman correction" > stanza -- what it does is either useless (zeroing already-zeroed > fields) or in the wrong place (if we don't want to allow other flags > with IS_ROMAN, that should be done via an error in NUMDesc_prepare, > like other similar cases). In the attached I moved the roman_to_int > call into NUM_processor's main loop so it's more like other format > patterns, and fixed it to not eat any more characters than it should. > This might allow putting back some format-combination capabilities, > but other than the whitespace angle I figure we can leave that for > people to request it. > Thank you for the tweaks to the patch. Overall, it looks great. Initially, when I looked at the test case: SELECT to_number('M CC', 'RN'); I was confused about whether it should be 'MCC'. After further inspection, I realized that the output is 1000 for 'M'. The format of the input can be a bit unclear. Perhaps we could add a note in the documentation or issue a warning in roman_to_int function when input is truncated, to clarify this behavior. + bool truncated = false; + + /* + * Collect and decode valid roman numerals, consuming at most + * MAX_ROMAN_LEN characters. We do this in a separate loop to avoid + * repeated decoding and because the main loop needs to know when it's at + * the last numeral. + */ + for (len = 0; len < MAX_ROMAN_LEN && !OVERLOAD_TEST; len++) + { + char currChar = pg_ascii_toupper(*Np->inout_p); + int currValue = ROMAN_VAL(currChar); + + if (currValue == 0) + { + truncated = true; + break; /* Not a valid roman numeral. */ + } + romanChars[len] = currChar; + romanValues[len] = currValue; + Np->inout_p++; + } + + if (len == 0) + return -1; /* No valid roman numerals. */ + + /* Check for truncation. */ + if (truncated) + { + ereport(WARNING, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("Input truncated to \"%.*s\"", + len, romanChars))); + } Output after change: postgres=# SELECT to_number('MMXX ', 'RN'); WARNING: Input truncated to "MMXX" to_number ----------- 2020 (1 row) postgres=# SELECT to_number(' XIV ', ' RN'); WARNING: Input truncated to "XIV" to_number ----------- 14 (1 row) postgres=# SELECT to_number('M CC', 'RN'); WARNING: Input truncated to "M" to_number ----------- 1000 (1 row) Also, some modifications to the test cases will be required if we go with these changes. Regards, Hunaid Sohail