Hi Tom,

On Sat, Jan 18, 2025 at 5:27 AM Tom Lane <t...@sss.pgh.pa.us> wrote:

> Hunaid Sohail <hunaidp...@gmail.com> writes:
> > I’ve attached a new patch that addresses comments 2, 3, and 4 from Tomas.
>
> I'm still quite unhappy that this doesn't tolerate trailing
> whitespace.  That's not how other format patterns work, and it makes
> it impossible to round-trip the output of to_char(..., 'RN').
> I think the core of the problem is that you tried to cram the logic
> in where the existing "not implemented" error is thrown.  But on
> inspection there is nothing sane about that entire "Roman correction"
> stanza -- what it does is either useless (zeroing already-zeroed
> fields) or in the wrong place (if we don't want to allow other flags
> with IS_ROMAN, that should be done via an error in NUMDesc_prepare,
> like other similar cases).  In the attached I moved the roman_to_int
> call into NUM_processor's main loop so it's more like other format
> patterns, and fixed it to not eat any more characters than it should.
> This might allow putting back some format-combination capabilities,
> but other than the whitespace angle I figure we can leave that for
> people to request it.
>

Thank you for the tweaks to the patch. Overall, it looks great.

Initially, when I looked at the test case:
SELECT to_number('M CC', 'RN');

I was confused about whether it should be 'MCC'. After further inspection,
I realized that the output is 1000 for 'M'. The format of the input can be a
bit unclear. Perhaps we could add a note in the documentation
or issue a warning in roman_to_int function when input is truncated,
to clarify this behavior.

+       bool            truncated = false;
+
+       /*
+        * Collect and decode valid roman numerals, consuming at most
+        * MAX_ROMAN_LEN characters.  We do this in a separate loop to avoid
+        * repeated decoding and because the main loop needs to know when
it's at
+        * the last numeral.
+        */
+       for (len = 0; len < MAX_ROMAN_LEN && !OVERLOAD_TEST; len++)
+       {
+               char            currChar = pg_ascii_toupper(*Np->inout_p);
+               int                     currValue = ROMAN_VAL(currChar);
+
+               if (currValue == 0)
+               {
+                       truncated = true;
+                       break;                          /* Not a valid
roman numeral. */
+               }
+               romanChars[len] = currChar;
+               romanValues[len] = currValue;
+               Np->inout_p++;
+       }
+
+       if (len == 0)
+               return -1;                              /* No valid roman
numerals. */
+
+       /* Check for truncation. */
+       if (truncated)
+       {
+               ereport(WARNING,
+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                errmsg("Input truncated to \"%.*s\"",
+                                               len, romanChars)));
+       }

Output after change:

postgres=# SELECT to_number('MMXX  ', 'RN');
WARNING:  Input truncated to "MMXX"
 to_number
-----------
      2020
(1 row)

postgres=# SELECT to_number('  XIV  ', '  RN');
WARNING:  Input truncated to "XIV"
 to_number
-----------
        14
(1 row)

postgres=# SELECT to_number('M CC', 'RN');
WARNING:  Input truncated to "M"
 to_number
-----------
      1000
(1 row)

Also, some modifications to the test cases will be required
if we go with these changes.

Regards,
Hunaid Sohail

Reply via email to