On 05/14/2013 10:05 AM, Simon Riggs wrote: > I'm proposing this now as a possible GSoC project: > > In 1-byte character encodings (i.e. not UTF-8), SUBSTR() is optimised > to allow seeking straight to the exact slice when retrieving a large > toasted value. This reduces I/O considerably when you have large > toasted values since it is an O(1) action rather than an O(N). > > This is possible because the slicing of toasted values is predictable > on 1 byte encodings. > > It would be useful to have a predictable function perform the slicing, > so we could use that knowledge later to optimise searches in a wider > range of situations. More specifically, since UTF-8 is so common, to > allow optimisations in that encoding of common data: text, XML, JSON. > > e.g. if we knew that an XML document has a required element called > TITLE and that occurs only once and always in the first slice, it > would be useful information to use in search functions. (Not sure, but > it may be possible to assign non-consecutive slice numbers to allow > variable data mid-way through a column value if needed). > > e.g. in UTF-8 free text we could put 500 characters in each slice, so > that even if that could be anywhere between 500 and 2000 bytes it > would still fit just fine. > > e.g. for arrays, if we put say 200 elements per slice, then accessing > particular elements would require only 1 slice retrieval. > > Doing this would *possibly* reduce packing density, but not certainly > so. But it would greatly improve access times to large structured > toast values. On the contrary, as it would enable us to pack the chunks fitting more on the page, especially for :)
That is, first chunk into N bytes, then compress each chunk ----------------- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers