Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Önder Kalacı Tue, 20 Sep 2022 03:30:02 -0700

Hi Peter,



>
> 1. src/backend/executor/execReplication.c - build_replindex_scan_key
>
> - * This is not generic routine, it expects the idxrel to be replication
> - * identity of a rel and meet all limitations associated with that.
> + * This is not generic routine, it expects the idxrel to be an index
> + * that planner would choose if the searchslot includes all the columns
> + * (e.g., REPLICA IDENTITY FULL on the source).
>   */
> -static bool
> +static int
>  build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
>   TupleTableSlot *searchslot)
>
>
> (I know this is not caused by your patch but maybe fix it at the same
> time?)
>
> "This is not generic routine, it expects..." -> "This is not a generic
> routine - it expects..."
>
>
Fixed


>
> 2.
>
> + IndexInfo  *indexInfo PG_USED_FOR_ASSERTS_ONLY;
> +
> + /*
> + * This attribute is an expression, and
> + * SuitablePathsForRepIdentFull() was called earlier while the
> + * index for subscriber is selected. There, the indexes comprising
> + * *only* expressions have already been eliminated.
> + *
> + * We sanity check this now.
> + */
> +#ifdef USE_ASSERT_CHECKING
> + indexInfo = BuildIndexInfo(idxrel);
> + Assert(!IndexOnlyOnExpression(indexInfo));
> +#endif
>
> 2a.
> "while the index for subscriber is selected..." -> "when the index for
> the subscriber was selected...”
>
>
fixed


> ~
>
> 2b.
> Because there is only one declaration in this code block you could
> simplify this a bit if you wanted to.
>
> SUGGESTION
> /*
>  * This attribute is an expression, and
>  * SuitablePathsForRepIdentFull() was called earlier while the
>  * index for subscriber is selected. There, the indexes comprising
>  * *only* expressions have already been eliminated.
>  *
>  * We sanity check this now.
>  */
> #ifdef USE_ASSERT_CHECKING
> IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
> Assert(!IndexOnlyOnExpression(indexInfo));
> #endif
>
>
Makes sense, no reason to declare above


> ~~~
>
> 3. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
>
> + /* Start an index scan. */
> + scan = index_beginscan(rel, idxrel, &snap, skey_attoff, 0);
>  retry:
>   found = false;
>
> It might be better to have a blank line before that ‘retry’ label,
> like in the original code.
>

agreed, fixed


>
> ======
>
> 4. src/backend/replication/logical/relation.c
>
> +/* see LogicalRepPartMapEntry for details in logicalrelation.h */
>  static HTAB *LogicalRepPartMap = NULL;
>
> Personally, I'd word that something like:
> "/* For LogicalRepPartMap details see LogicalRepPartMapEntry in
> logicalrelation.h */"
>
> but YMMV.
>

I also don't have any strong opinions on that, updated to your suggestion.


>
> ~~~
>
> 5. src/backend/replication/logical/relation.c -
> GenerateDummySelectPlannerInfoForRelation
>
> +/*
> + * This is not a generic function, helper function for
> + * GetCheapestReplicaIdentityFullPath. The function creates
> + * a dummy PlannerInfo for the given relationId as if the
> + * relation is queried with SELECT command.
> + */
> +static PlannerInfo *
> +GenerateDummySelectPlannerInfoForRelation(Oid relationId)
>
> (mentioned this one in my previous post)
>
> "This is not a generic function, helper function" -> "This is not a
> generic function. It is a helper function"
>

Yes, applied.


>
> ~~~
>
> 6. src/backend/replication/logical/relation.c -
> GetCheapestReplicaIdentityFullPath
>
> +/*
> + * Generate all the possible paths for the given subscriber relation,
> + * for the cases that the source relation is replicated via REPLICA
> + * IDENTITY FULL. The function returns the cheapest Path among the
> + * eligible paths, see SuitablePathsForRepIdentFull().
> + *
> + * The function guarantees to return a path, because it adds sequential
> + * scan path if needed.
> + *
> + * The function assumes that all the columns will be provided during
> + * the execution phase, given that REPLICA IDENTITY FULL guarantees
> + * that.
> + */
> +static Path *
> +GetCheapestReplicaIdentityFullPath(Relation localrel)
>
>
> "for the cases that..." -> "for cases where..."
>
>
sounds good


> ~~~
>
> 7.
>
> + /*
> + * Currently it is not possible for planner to pick a partial index or
> + * indexes only on expressions. We still want to be explicit and eliminate
> + * such paths proactively.
>
> "for planner..." -> "for the planner..."
>

fixed


>
> ======
>
> 8. .../subscription/t/032_subscribe_use_index.pl - general
>
> 8a.
> (remove the 'we')
> "# we wait until..." -> "# wait until..." X many occurrences
>
> ~
>
> 8b.
> (remove the 'we')
> "# we show that..." -> “# show that..." X many occurrences
>

Ok, removed all "we"s in the test

>
> ~~~
>
> 9.
>
> There is inconsistent wording for some of your test case start/end comments
>
> 9a.
> e.g.
> start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
> end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS
>
> ~
>
> 9b.
> e.g.
> start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
> end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS
>
>
thanks, fixed all



> ~~~
>
> 10.
>
> I did not really understand the point of having special subscription names
> tap_sub_rep_full_0
> tap_sub_rep_full_2
> tap_sub_rep_full_3
> tap_sub_rep_full_4
> etc...
>
> Since you drop/recreate these for each test case can't they just be
> called 'tap_sub_rep_full'?
>
>
There is no special reason for that, updated all to tap_sub_rep_full.

I think I initially made it in order to distinguish certain error messages
in the tests, but then we already have unique messages regardless of the
subscription name.


> ~~~
>
> 11. SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
>
> +# updates 200 rows
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
>
> The comment says update but this is doing delete
>
>
fixed


>
> ~~~
>
> 12. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS
>
> +# cleanup sub
> +$node_subscriber->safe_psql('postgres',
> + "DROP SUBSCRIPTION tap_sub_rep_full_4");
>
> Unusual wrapping?
>

Fixed


>
> ~~~
>
> 13. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES
>
> +# updates rows and moves between partitions
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
>
> The comment says update but SQL says delete
>
>
fixed


> ~~~
>
> 14. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS
>
> +# update 1 row and delete 1 row using index_b, so index_a still has 2
> idx_scan
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'index_a';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full_0 updates two rows via index scan with index on high
> cardinality column-3";
> +
>
> The comment seems misplaced. Doesn't it belong on the lines above this
> where the update/delete is being done?
>
>
Yes, it seems so. moved


> ~~~
>
> 15. SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - INHERITED
> TABLE
>
> +# ANALYZING child will change the index used on child_1 and going to
> use index_on_child_1_b
> +$node_subscriber->safe_psql('postgres', "ANALYZE child_1");
>
> Should the comment say 'child_1' instead of child?
>
> ------
>

Seems better, changed.

Thanks for the reviews, attached v12.

Onder Kalaci

v12_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Reply via email to