On Mon, Jul 18, 2022 at 4:15 AM Martin Kalcher <martin.kalc...@aboutsource.net> wrote: > Am 17.07.22 um 08:00 schrieb Thomas Munro: > >> Actually ... is there a reason to bother with an intarray version > >> at all, rather than going straight for an in-core anyarray function? > >> It's not obvious to me that an int4-only version would have > >> major performance advantages. > > > > Yeah, that seems like a good direction. If there is a performance > > advantage to specialising, then perhaps we only have to specialise on > > size, not type. Perhaps there could be a general function that > > internally looks out for typbyval && typlen == 4, and dispatches to a > > specialised 4-byte, and likewise for 8, if it can, and that'd already > > be enough to cover int, bigint, float etc, without needing > > specialisations for each type. > > I played around with the idea of an anyarray shuffle(). The hard part > was to deal with arrays with variable length elements, as they can not > be swapped easily in place. I solved it by creating an intermediate > array of references to the elements. I'll attach a patch with the proof > of concept. Unfortunatly it is already about 5 times slower than the > specialised version and i am not sure if it is worth going down that road.
Seems OK for a worst case. It must still be a lot faster than doing it in SQL. Now I wonder what the exact requirements would be to dispatch to a faster version that would handle int4. I haven't studied this in detail but perhaps to dispatch to a fast shuffle for objects of size X, the requirement would be something like typlen == X && align_bytes <= typlen && typlen % align_bytes == 0, where align_bytes is typalign converted to ALIGNOF_{CHAR,SHORT,INT,DOUBLE}? Or in English, 'the data consists of densely packed objects of fixed size X, no padding'. Or perhaps you can work out the padded size and use that, to catch a few more types. Then you call array_shuffle_{2,4,8}() as appropriate, which should be as fast as your original int[] proposal, but work also for float, date, ...? About your experimental patch, I haven't reviewed it properly or tried it but I wonder if uint32 dat_offset, uint32 size (= half size elements) would be enough due to limitations on varlenas.