I recently stumbled upon the following code in ginfast.c: while (collector->ntuples + nentries > collector->lentuples) { collector->lentuples *= 2; collector->tuples = (IndexTuple *) repalloc(collector->tuples, sizeof(IndexTuple) * collector->lentuples); }
it does not seem very smart to perform the repalloc() inside the loop here as there could be many loops before we double the lentuples so that it's large enough to allow storage of all the required values. The attached patch changes things around so that the repalloc() is done outside of the loop. i.e. done only once, after we've determined the correct size to reallocate it to. I've also added an else condition so that we only bother checking this case when the tuples[] array is not already allocated. I tested with the following: create table t1 (a int[], b int[]); create index on t1 using gin (a,b) with (fastupdate = on); truncate t1; insert into t1 select '{1}'::int[],('{' || string_agg(y::text, ',') || '}')::int[] from generate_Series(1,1000000) x, generate_Series(1,10) y group by x order by x desc; In the above case with an unpatched master, I measured the repalloc() to only consume 0.6% of the total runtime of the INSERT, so this does not really improve performance by much, but I thought it was worth fixing never-the-less. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
fix_gin_fast_insert.patch
Description: Binary data