Hi, I have started working on the next version of the patch and have addressed the smaller issues identified by Tom. Before proceeding further, I would like to have opinions on some comments/questions in my previous email.
Regards, Hunaid Sohail On Mon, Sep 9, 2024 at 5:45 PM Hunaid Sohail <hunaidp...@gmail.com> wrote: > Hi, > > On Sun, Sep 8, 2024 at 4:09 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> A few notes from a quick read of the patch: >> >> * roman_to_int() should have a header comment, notably explaining its >> result convention. I find it fairly surprising that "0" means an >> error --- I realize that Roman notation can't represent zero, but >> wouldn't it be better to use -1? >> > > Noted. > > >> >> * Do *not* rely on toupper(). There are multiple reasons why not, >> but in this context the worst one is that in Turkish locales it's >> likely to translate "i" to "İ", on which you will fail. I'd use >> pg_ascii_toupper(). >> > > Noted. > > >> >> * I think roman_to_int() is under-commented internally too. >> To start with, where did the magic "15" come from? And why >> should we have such a test anyway --- what if the format allows >> for trailing stuff after the roman numeral? (That is, I think >> you need to fix this so that roman_to_int reports how much data >> it ate, instead of assuming that it must read to the end of the >> input.) > > > MMMDCCCLXXXVIII is the longest valid standard roman numeral (15 > characters). I will add appropriate comment on length check. > > I am not sure I am able to understand the latter part. Could you please > elaborate it? > Are you referring to cases like these? > SELECT to_number('CXXIII foo', 'RN foo'); > SELECT to_number('CXXIII foo', 'RN'); > Please check my comments on Oracle's behaviour below. > > > >> The logic for detecting invalid numeral combinations >> feels a little opaque too. Do you have a source for the rules >> you're implementing, and if so could you cite it? >> > > There are many sources on the internet. > A few sources: > 1. https://www.toppr.com/guides/maths/knowing-our-numbers/roman-numerals/ > 2. https://www.cuemath.com/numbers/roman-numerals/ > > Note that we are following the standard form: > https://en.wikipedia.org/wiki/Roman_numerals#Standard_form > > >> >> * This code will get run through pgindent before commit, so you >> might want to revisit whether your comments will still look nice >> afterwards. There's not a lot of consistency in them about initial >> cap versus lower case or trailing period versus not, too. >> > > Noted. > > >> >> * roman_result could be declared where used, no? >> > > Noted. > > >> >> * I'm not really convinced that we need a new errcode >> ERRCODE_INVALID_ROMAN_NUMERAL rather than using a generic one like >> ERRCODE_INVALID_TEXT_REPRESENTATION. However, if we do do it >> like that, why did you pick the out-of-sequence value 22P07? >> > > 22P07 is not out-of-sequence. 22P01 to 22P06 are already used. > However, I do agree with you that we can use > ERRCODE_INVALID_TEXT_REPRESENTATION. I will change it. > > >> >> * Might be good to have a few test cases demonstrating what happens >> when there's other stuff combined with the RN format spec. Notably, >> even if RN itself won't eat whitespace, there had better be a way >> to write the format string to allow that. >> > > The current patch (v3) simply ignores other formats with RN except when RN > is used EEEE which returns error. > ``` > postgres=# SELECT to_number('CXXIII', 'foo RN'); > to_number > ----------- > 123 > (1 row) > > postgres=# SELECT to_number('CXXIII', 'MIrn'); > to_number > ----------- > 123 > (1 row) > > postgres=# SELECT to_number('CXXIII', 'EEEErn'); > ERROR: "EEEE" must be the last pattern used > ``` > > However, I think that other formats except FM should be rejected when used > with RN in NUMDesc_prepare function. Any opinions? > > If we look into Oracle, they are strict in this regard and don't allow any > other format with RN. > 1. SELECT to_char(12, 'MIrn') from dual; > 2. SELECT to_char(12, 'foo RN') from dual; > results in error: > ORA-01481: invalid number format model > > I also found that there is no check when RN is used twice (both in to_char > and to_number) and that results in unexpected output. > > ``` > postgres=# SELECT to_number('CXXIII', 'RNrn'); > to_number > ----------- > 123 > (1 row) > > postgres=# SELECT to_char(12, 'RNrn'); > to_char > -------------------------------- > XII xii > (1 row) > > postgres=# SELECT to_char(12, 'rnrn'); > to_char > -------------------------------- > xii xii > (1 row) > > postgres=# SELECT to_char(12, 'FMrnrn'); > to_char > --------- > xiixii > (1 row) > ``` > > >> * Further to Aleksander's point about lack of test coverage for >> the to_char direction, I see from >> >> https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html >> that almost none of the existing roman-number code paths are covered >> today. While it's not strictly within the charter of this patch >> to improve that, maybe we should try to do so --- at the very >> least it'd raise formatting.c's coverage score a few notches. >> >> > I see that you have provided a patch to increase test coverage of to_char > roman format including some other formats. I will try to add more tests for > the to_number roman format. > > > I will provide the next patch soon. > > Regards, > Hunaid Sohail >