On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda <veldanda.nikhilkuma...@gmail.com> wrote: > struct /* Extended compression format */ > { > uint32 va_header; > uint32 va_tcinfo; > uint32 va_cmp_alg; > uint32 va_cmp_dictid; > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } va_compressed_ext; > } varattrib_4b;
First, thanks for sending along the performance results. I agree that those are promising. Second, thanks for sending these design details. The idea of keeping dictionaries in pg_zstd_dictionaries literally forever doesn't seem very appealing, but I'm not sure what the other options are. I think we've established in previous work in this area that compressed values can creep into unrelated tables and inside records or other container types like ranges. Therefore, we have no good way of knowing when a dictionary is unreferenced and can be dropped. So in that sense your decision to keep them forever is "right," but it's still unpleasant. It would even be necessary to make pg_upgrade carry them over to new versions. If we could make sure that compressed datums never leaked out into other tables, then tables could depend on dictionaries and dictionaries could be dropped when there were no longer any tables depending on them. But like I say, previous work suggested that this would be very difficult to achieve. However, without that, I imagine users generating new dictionaries regularly as the data changes and eventually getting frustrated that they can't get rid of the old ones. -- Robert Haas EDB: http://www.enterprisedb.com