On 29/07/18 14:47, Andrey Borodin wrote:
Fixed both problems. PFA v14.
Thanks, took a really quick look at this. The text being added to README is outdated for these latest changes.
In second step I still use paloc's memory, but only to store two bitmaps: bitmap of internal pages and bitmap of empty leafs. Second physical scan only reads internal pages. I can omit that bitmap, if I'll scan everything. Also, I can replace emptyLeafs bitmap with array\list, but I do not really think it will be big.
On a typical GiST index, what's the ratio of leaf vs. internal pages? Perhaps an array would indeed be better. If you have a really large index, the bitmaps can take a fair amount of memory, on top of the memory used for tracking the dead TIDs. I.e. that memory will be in addition to maintenance_work_mem. That's not nice, but I think it's OK in practice, and not worth spending too much effort to eliminate. For a 1 TB index with default 8k block size, the two bitmaps will take 32 MB of memory in total. If you're dealing with a database of that size, you ought to have some memory to spare. But if an array would use less memory, that'd be better.
If you go with bitmaps, please use the existing Bitmapset instead of rolling your own. Saves some code, and it provides more optimized routines for iterating through all the set bits, too (bms_next_member()). Another possibility would be to use Tidbitmap, in the "lossy" mode, i.e. add the pages with tbm_add_page(). That might save some memory, compared to Bitmapset, if the bitmap is very sparse. Not sure how it compares with a plain array.
A straightforward little optimization would be to skip scanning the internal pages, when the first scan didn't find any empty pages. And stop the scan of the internal pages as soon as all the empty pages have been recycled.
- Heikki