On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teo...@sigaev.ru> wrote:
> We are working on custom toaster for JSONB [1], because current TOAST is > universal for any data type and because of that it has some disadvantages: > - "one toast fits all" may be not the best solution for particular > type or/and use cases > - it doesn't know the internal structure of data type, so it cannot > choose an optimal toast strategy > - it can't share common parts between different rows and even > versions of rows Agreed, Oleg has made some very clear analysis of the value of having a higher degree of control over toasting from within the datatype. In my understanding, we want to be able to 1. Access data from a toasted object one slice at a time, by using knowledge of the structure 2. If toasted data is updated, then update a minimum number of slices(s), without rewriting the existing slices 3. If toasted data is expanded, then allownew slices to be appended to the object without rewriting the existing slices > Modification of current toaster for all tasks and cases looks too > complex, moreover, it will not works for custom data types. Postgres > is an extensible database, why not to extent its extensibility even > further, to have pluggable TOAST! We propose an idea to separate > toaster from heap using toaster API similar to table AM API etc. > Following patches are applicable over patch in [1] ISTM that we would want the toast algorithm to be associated with the datatype, not the column? Can you explain your thinking? We already have Expanded toast format, in-memory, which was designed specifically to allow us to access sub-structure of the datatype in-memory. So I was expecting to see an Expanded, on-disk, toast format that roughly matched that concept, since Tom has already shown us the way. (varatt_expanded). This would be usable by both JSON and PostGIS. Some other thoughts: I imagine the data type might want to keep some kind of dictionary inside the main toast pointer, so we could make allowance for some optional datatype-specific private area in the toast pointer itself, allowing a mix of inline and out-of-line data, and/or a table of contents to the slices. I'm thinking could also tackle these things at the same time: * We want to expand TOAST to 64-bit pointers, so we can have more pointers in a table * We want to avoid putting the data length into the toast pointer, so we can allow the toasted data to be expanded without rewriting everything (to avoid O(N^2) cost) -- Simon Riggs http://www.EnterpriseDB.com/