Imai-san, From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp] > On 2019/01/22 18:47, David Rowley wrote: > > On Tue, 22 Jan 2019 at 20:01, Imai, Yoshikazu > >> What I understand so far is about 10,000 while loops at total > (4098+4098+some extra) is needed in hash_seq_search() in EXECUTE query > after the creation of the generic plan. > >> 10,000 while loops takes about 10 microsec (of course, we can't estimate > correct time), and the difference of the latency between 5th and 7th EXECUTE > is about 8 microsec, I currently think this causes the difference. > > > > >> I don't know this problem relates to Amit-san's patch, but I'll continue > to investigate it. > > > > I had another thought... when you're making a custom plan you're only > > grabbing locks on partitions that were not pruned (just 1 partition in > > your case), but when making the generic plan, locks will be acquired > > on all partitions (all 4000 of them). This likely means that when > > building the generic plan for the first time that the > > LockMethodLocalHash table is expanded to fit all those locks, and > > since we never shrink those down again, it'll remain that size for the > > rest of your run. I imagine the reason for the slowdown is that > > during LockReleaseAll(), a sequential scan is performed over the > > entire hash table. I see from looking at the hash_seq_search() code > > that the value of max_bucket is pretty critical to how it'll perform. > > The while ((curElem = segp[segment_ndx]) == NULL) loop will need to > > run fewer times with a lower max_bucket. > > I too think that that might be behind that slight drop in performance. > So, it's good to know what one of the performance bottlenecks is when > dealing with large number of relations in queries.
Can you compare the performance of auto and force_custom_plan again with the attached patch? It uses PGPROC's LOCALLOCK list instead of the hash table. Regards Takayuki Tsunakawa
faster-locallock-scan.patch
Description: faster-locallock-scan.patch