The following tests demonstrate the speedup which may be achieved with my patch:
1. Add to postgresql.conf enable_hashjoin = OFF enable_mergejoin = OFF enable_material = OFF enable_bitmapscan = OFF enable_nestloop = ON max_parallel_workers_per_gather = 0 enable_memoize = OFF 2. create test tables: create table test_part(id int primary key) partition by range(id); create table test_part0 partition of test_part for values from (0) to (1000000); create table test_part1 partition of test_part for values from (1000000) to (2000000); insert into test_part select id from generate_series(0, 1000000-1) as id; create table ids(id int, name varchar); create index on ids(ascii(name)); insert into ids select id, 'worst case' as name from generate_series(0, 1000000-1) as id; insert into ids select 123456, 'best case' as name from generate_series(0, 1000000-1) as id; 3. run tests: explain analyze select * from ids join test_part on ids.id=test_part.id where ascii(ids.name)=ascii('best case'); explain analyze select * from ids join test_part on ids.id=test_part.id where ascii(ids.name)=ascii('worst case'); The average results on my machine are as follows: | vanila postgres | patched postgres best case | 2286 ms | 1924 ms worst case | 2278 ms | 2360 ms So far I haven't refactored the patch as per David's advice. I just want to understand if we need such an optimization? чт, 23 мар. 2023 г. в 17:05, David Rowley <dgrowle...@gmail.com>: > On Thu, 23 Mar 2023 at 19:46, Alexandr Nikulin > <alexandr.s.niku...@gmail.com> wrote: > > I propose to slightly improve the performance of nested loop join in the > case of partitioned inner table. > > As I see in the code, the backend looks for the partition of the inner > table each time after fetch a new row from the outer table. > > These searches can take a significant amount of time. > > But we can skip this step if the nested loop parameter(s) was(re) not > changed since the previous row fetched from the outer table > > I think if we were to do something like that, then it should be done > in nodeAppend.c and nodeMergeAppend.c. That way you can limit it to > only checking parameters that partition pruning needs to care about. > That does mean you'd need to find somewhere to cache the parameter > values, however. Doing it in nodeNestloop.c means you're doing it when > the inner subplan is something that does not suffer from the > additional overhead you want to avoid, e.g an Index Scan. > > Also, generally, if you want to get anywhere with a performance patch, > you should post performance results from before and after your change. > Also include your benchmark setup and relevant settings for how you > got those results. For this case, you'll want a best case (parameter > value stays the same) and a worst case, where the parameter value > changes on each outer row. I expect you're going to add overhead to > this case as your additional checks will always detect the parameter > has changed as that'll always require partition pruning to be executed > again. We'll want to know if that overhead is high enough for us not > to want to do this. > > I'll be interested to see a test that as some varlena parameter of say > a few hundred bytes to see how much overhead testing if that parameter > has changed when the pruning is being done on a HASH partitioned > table. HASH partitioning should prune quite a bit faster than both > LIST and RANGE as the hashing is effectively O(1) vs O(log2 N) (N > being the number of Datums in the partition bounds). I'd expect a > measurable additional overhead with the patch when the parameter > changes on each outer row. > > David >