On Fri, Sep 6, 2024 at 11:32 AM Tatsuo Ishii <is...@postgresql.org> wrote: > > > Thanks for making the adjustments to this. > > > > I don't think there is any need to call tuplestore_updatemax() from > > within writetup_heap(). That means having to update the maximum space > > used every time a tuple is written to disk. That's a fairly massive > > overhead. > > > > Instead, it should be fine to modify tuplestore_updatemax() to set a > > flag to true if state->status != TSS_INMEM and then record the disk > > space used. That flag won't ever be set to false again. > > tuplestore_storage_type_name() should just return "Disk" if the new > > disk flag is set, even if state->status == TSS_INMEM. Since the > > work_mem size won't change between tuplestore_clear() calls, if we've > > once spilt to disk, then we shouldn't care about the memory used for > > runs that didn't. Those will always have used less memory. > > > > I did this quickly, but playing around with the attached, I didn't see > > any slowdown. > > Your patch looks good to me and I confirmed that with your patch I > didn't see any slowdown either. Thanks!
The changes look better. A nitpick though. With their definitions changed, I think it's better to change the names of the functions since their purpose has changed. Right now they report the storage type and size used, respectively, at the time of calling the function. With this patch, they report maximum space ever used and the storage corresponding to the maximum space. tuplestore_space_used() may be changed to tuplestore_maxspace_used(). I am having difficulty with tuplestore_storage_type_name(); tuplestore_largest_storage_type_name() seems mouthful and yet not doing justice to the functionality. It might be better to just have one funciton tuplestore_maxspace_used() which returns both the maximum space used as well as the storage type when maximum space was used. The comments need a bit of grammar fixes, but that can be done when finalizing the patches. -- Best Wishes, Ashutosh Bapat