On Mon, Feb 28, 2022 at 3:22 PM davinder singh <davindersingh2...@gmail.com> wrote: > > Hi, > > For Toast storage [1] in PostgreSQL, first, the attribute value is compressed > and then divided into chunks. The problem with storing compressed value is, > if we > are not saving enough space such that it reduces the #chunks then we end up > adding extra decompression cost on every read. > Based on the discussion with Robert and Dilip, we tried to optimize this > process. The idea is to TOAST the compressed value only if we are saving at > least > 1 chunk(2KB default) of disk storage else use the uncompressed one. In this > way, > we will save decompression costs without losing much on storage. > > In our tests, we have observed improvement in the read performance up to 28% > by > giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The > gain is more on large attributes (size > 4KB) because > compression/decompression > cost increases with size. > However, We have found, this assumption is not true when the data compression > ratio is more and the size is barely big enough to cause TOASTing. For > example, > in the following test 4. b, we didn't get any performance advantage but the > table > size grew by 42% by storing uncompressed values. > Test Setup. > > Create table t1_lz4 ( a text compression lz4, b text compression lz4); > -- Generate random data > create or replace function generate_att_data(len_info int) > returns text > language plpgsql > as > $$ > declare > value text; > begin > select array_agg(md5(g::text)) > into value > from generate_series(1, round(len_info/33)::int) g; > return value; > end; > $$; > > --Test > Select b from t1_lz4; > > Test 1: > Data: rows 200000 > insert into t1_lz4(a, b) select generate_att_data(364), repeat > (generate_att_data(1980), 2); > Summary: > Attribute size: original: 7925 bytes, after compression: 7845 bytes > Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11% > table size: Head 1662 MB, Patch: 1662 MB > > Test 2: > Data: rows 100000 > insert into t1_lz4(a, b) select generate_att_data(364), > generate_att_data(16505); > Summary: > Attribute size: original: 16505 bytes, after compression: 16050 bytes > Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14% > table size: Head 1636 MB, Patch: 1688 MB > > Test 3: > Data: rows 50000 > insert into t1_lz4(a, b) select generate_att_data(364), > generate_att_data(31685); > Summary: > Attribute size: original: 31685 bytes, after compression: 30263 bytes > Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28% > table size: Head 1601 MB, Patch: 1601 MB > > Test 4.a: > Data: rows 200000 > insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || > generate_att_data(3885); > Summary: > Attribute size: original: 3885 bytes, after compression: 3645 bytes > Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7% > table size: Head 872 MB, Patch: 872 MB > > Test 4.b (High compression): > Data: rows 200000 > insert into t1_lz4(a, b) select generate_att_data(364), repeat > (generate_att_data(1980), 2); > Summary: > Attribute size: original: 3966 bytes, after compression: 2012 bytes > Time for select: head: 27 sec, patch: 26 sec, Performance Gain: 0% > table size: Head 612 MB, Patch: 872 MB
I think the idea looks interesting and the results are also promising. I have a few initial comments, later I will do more detailed review. 1. + if (*value == orig_toast_value) + toast_tuple_externalize(ttc, attribute, options); Isn't it looks cleaner to check whther the attribute is compressed or not like this VARATT_IS_COMPRESSED(DatumGetPointer(*value)) ? 2. + /* Using the uncompressed data instead, deleting compressed data. */ + pfree(DatumGetPointer(*value)); change as below /deleting compressed data/release memory for the compressed data 3. + memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr)); + memcpy(toast_values_copy, toast_values, sizeof(toast_values)); + Add some comment here, what we are trying to copy and why? 4. + /* incompressible, ignore on subsequent compression passes. */ + orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE; Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize it, the comment say "ignore on subsequent compression passes" but after this will there be more compression passes? If we need to set this TOASTCOL_INCOMPRESSIBLE then comment should explain this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com