Hi Moon-san. Thank you for posting.
We are testing the GIN index onJSONB type. The default maintenance_work_mem (64MB) was fine in usually cases. However, this problem occurs when indexing very large JSONB data. best regards, Keisuke Kuroda 2019年8月29日(木) 17:20 Moon, Insung <tsukiwamoon.pg...@gmail.com>: > Dear Hackers. > > Kuroda-san and I are interested in the GIN index and have been testing > various things. > While testing, we are found a little bug. > Some cases, the value of nEntries in the metapage was set to the wrong > value. > > This is a reproduce of bug situation. > =# SET maintenance_work_mem TO '1MB'; > =# CREATE TABLE foo(i jsonb); > =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM > generate_series(1, 10000) AS i; > > # Input the same value again. > =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM > generate_series(1, 10000) AS i; > # Creates GIN Index. > =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops); > > > =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH > (fastupdate=off); > -[ RECORD 1 ]----+----------- > pending_head | 4294967295 > pending_tail | 4294967295 > tail_free_size | 0 > n_pending_pages | 0 > n_pending_tuples | 0 > n_total_pages | 74 > n_entry_pages | 69 > n_data_pages | 4 > n_entries | 20004 <--★ > version | 2 > > In this example, the nentries value should be 10001 because the gin > index stores duplicate values in one leaf(posting tree or posting > list). > But, if look at the nentries value of metapage using pageinspect, it > is stored as 20004. > So, Let's run the vacuum. > > > =# VACUUM foo; > =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)); > -[ RECORD 1 ]----+----------- > pending_head | 4294967295 > pending_tail | 4294967295 > tail_free_size | 0 > n_pending_pages | 0 > n_pending_tuples | 0 > n_total_pages | 74 > n_entry_pages | 69 > n_data_pages | 4 > n_entries | 10001 <--★ > version | 2 > > Ah. Run to the vacuum, nEntries is changing the normal value. > > There is a problem with the ginEntryInsert function. That calls the > table scan when creating the gin index, ginBuildCallback function > stores the new heap value inside buildstate struct. > And next step, If GinBuildState struct is the size of the memory to be > using is equal to or larger than the maintenance_work_mem value, run > to input value into the GIN index. > This process is a function called ginEnctryInsert. > The ginEntryInsert function called at this time determines that a new > entry is added and increase the value of nEntries. > However currently, ginEntryInsert is first to increase in the value of > nEntries, and to determine if there are the same entries in the > current GIN index. > That causes the bug. > > The patch is very simple. > Fix to increase the value of nEntries only when a non-duplicate GIN > index leaf added. > > This bug detection and code fix worked with Kuroda-san. > > Best Regards. > Moon. >