Hunaid Sohail <hunaidp...@gmail.com> writes: > On Sat, Jan 18, 2025 at 5:27 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Initially, when I looked at the test case: > SELECT to_number('M CC', 'RN');
> I was confused about whether it should be 'MCC'. After further inspection, > I realized that the output is 1000 for 'M'. The format of the input can be a > bit unclear. Perhaps we could add a note in the documentation > or issue a warning in roman_to_int function when input is truncated, > to clarify this behavior. I don't think a warning is a great idea at all. There is no other place in formatting.c that issues warnings, and this doesn't seem like the place to start. As an example, regression=# select to_number('123garbage', '999999'); to_number ----------- 123 (1 row) There's certainly a case that could be made that to_number should be stricter about trailing garbage, but it's probably a couple decades too late to change now. In any case it seems completely inappropriate to me for RN to adopt that policy all by itself. A different way that we could potentially look at this example is that RN should be willing to skip embedded spaces, leading to 'M CC' producing 1200 not 1000. There is a little bit of support for that idea in the existing behavior regression=# select to_number('123 456', '999999'); to_number ----------- 123456 (1 row) However, when you poke at that a bit closer, it's not a license for unlimited whitespace: regression=# select to_number('123 456', '999999'); to_number ----------- 12345 (1 row) I've not tracked down the exact cause of that, but I think it has to do with the fact that NUM_numpart_from_char() is willing to eat a single leading space per call, even if it's not the first call. The original author's reasoning for that is lost to history thanks to the lack of comments, but I believe that the idea was not "eat random whitespace" but "consume a space that was emitted as a substitute for a plus sign". The fact that it can happen once per '9' code feels more like a bug than something intentional. I'm not proposing that we do something about that, at least not today, but it doesn't seem like something we want to model RN's behavior on either. So I'm not in favor of allowing embedded spaces. However ... in poking at this, I noticed that I had been mis-understanding what to_char(..., 'RN') does: it fills to 15 characters with *leading* spaces not trailing spaces as I'd been thinking: regression=# select to_char(433, '|RN|'); to_char ------------------- | CDXXXIII| (1 row) So on the argument that to_number's RN should be willing to consume what to_char's RN produces, we're both wrong and roman_to_int should be willing to eat leading spaces. What do you think? regards, tom lane