Re: [HACKERS] Slicing TOAST

Hannu Krosing Tue, 14 May 2013 00:51:34 -0700

On 05/14/2013 10:05 AM, Simon Riggs wrote:
> I'm proposing this now as a possible GSoC project:
>
> In 1-byte character encodings (i.e. not UTF-8), SUBSTR() is optimised
> to allow seeking straight to the exact slice when retrieving a large
> toasted value. This reduces I/O considerably when you have large
> toasted values since it is an O(1) action rather than an O(N).
>
> This is possible because the slicing of toasted values is predictable
> on 1 byte encodings.
>
> It would be useful to have a predictable function perform the slicing,
> so we could use that knowledge later to optimise searches in a wider
> range of situations. More specifically, since UTF-8 is so common, to
> allow optimisations in that encoding of common data: text, XML, JSON.
>
> e.g. if we knew that an XML document has a required element called
> TITLE and that occurs only once and always in the first slice, it
> would be useful information to use in search functions. (Not sure, but
> it may be possible to assign non-consecutive slice numbers to allow
> variable data mid-way through a column value if needed).
>
> e.g. in UTF-8 free text we could put 500 characters in each slice, so
> that even if that could be anywhere between 500 and 2000 bytes it
> would still fit just fine.
>
> e.g. for arrays, if we put say 200 elements per slice, then accessing
> particular elements would require only 1 slice retrieval.
>
> Doing this would *possibly* reduce packing density, but not certainly
> so. But it would greatly improve access times to large structured
> toast values.
On the contrary, as it would enable us to pack the chunks fitting
more on the page, especially for :)


That is, first chunk into N bytes, then compress each chunk

-----------------
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slicing TOAST

Reply via email to