On Thu, Mar 14, 2024 at 8:53 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 9:59 AM John Naylor <johncnaylo...@gmail.com> wrote: > > > BTW do we still want to test the tidstore by using a combination of > > > SQL functions? We might no longer need to input TIDs via a SQL > > > function. > > > > I'm not sure. I stopped short of doing that to get feedback on this > > much. One advantage with SQL functions is we can use generate_series > > to easily input lists of blocks with different numbers and strides, > > and array literals for offsets are a bit easier. What do you think? > > While I'm not a fan of the following part, I agree that it makes sense > to use SQL functions for test data generation: > > -- Constant values used in the tests. > \set maxblkno 4294967295 > -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is > 291. > -- We use a higher number to test tidstore. > \set maxoffset 512
I'm not really a fan of these either, and could be removed a some point if we've done everything else nicely. > It would also be easier for developers to test the tidstore with their > own data set. So I agreed with the current approach; use SQL functions > for data generation and do the actual tests inside C functions. Okay, here's an another idea: Change test_lookup_tids() to be more general and put the validation down into C as well. First we save the blocks from do_set_block_offsets() into a table, then with all those blocks lookup a sufficiently-large range of possible offsets and save found values in another array. So the static items structure would have 3 arrays: inserts, successful lookups, and iteration (currently the iteration output is private to check_set_block_offsets(). Then sort as needed and check they are all the same. Further thought: We may not really need to test block numbers that vigorously, since the radix tree tests should cover keys/values pretty well. The difference here is using bitmaps of tids and that should be well covered. Locally (not CI), we should try big inputs to make sure we can actually go up to many GB -- it's easier and faster this way than having vacuum give us a large data set. > Is it > convenient for developers if we have functions like generate_tids() > and generate_random_tids() to generate TIDs so that they can pass them > to do_set_block_offsets()? I guess I don't see the advantage of adding a layer of indirection at this point, but it could be useful at a later time.