On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote: > Can't we test this as part of the normal testsuite?
One thing that complicates things a bit is that the test compares the results against ICU, so a mismatch in Unicode version between ICU and Postgres can cause test failures. The test ignores unassigned code points, so normally it just results in less-exhaustive test coverage. But sometimes things really do change, and that would cause a failure. I'm not quite sure how we should handle that -- maybe only run the test when the ICU version is known to be in a range where that's not a problem? Another option is to look for another way to test this code without ICU. We could generate a list of known mappings and compare to that, but we'd have to do it some way other than what the code is doing now, otherwise we'd just be testing the code against itself. Maybe we can load the Unicode data into a Postgres table and then test with a SELECT statement or something? I am worried that it will end looking like an over-engineered way to compare a text file to itself. Stepping back a moment, my top worry is really not to test those C functions, but to test the perl code that parses the text files and generates those arrays. Imagine a future Unicode version does something that the perl scripts didn't anticipate, and they fail to add array entries for half the code points, or something like that. By testing the arrays generated from freshly-parsed files exhaustively against ICU, then we have a good defense against that. That situation really only comes up when updating Unicode. That's not to say that the C code shouldn't be tested, of course. Maybe we can just do some spot checks for the functions that are reachable via SQL and get rid of the functions that aren't yet reachable (and re- add them when they are)? > I don't at all like that the tests depend on downloading new unicode > data. What if there was an update but I just want to test the current > state? I was mostly following the precedent for normalization. Should we change that, also? Regards, Jeff Davis