Re: [HACKERS] Declarative partitioning - another take

Tomas Vondra Wed, 14 Dec 2016 06:54:58 -0800

Hi Amit,

On 12/13/2016 09:45 AM, Amit Langote wrote:

On 2016/12/13 0:17, Tomas Vondra wrote:

On 12/12/2016 07:37 AM, Amit Langote wrote:


Hi Tomas,

On 2016/12/12 10:02, Tomas Vondra wrote:


2) I'm wondering whether having 'table' in the catalog name (and also in
the new relkind) is too limiting. I assume we'll have partitioned indexes
one day, for example - do we expect to use the same catalogs?


I am not sure I understand your idea of partitioned indexes, but I doubt
it would require entries in the catalog under consideration.  Could you
perhaps elaborate more?


OK, let me elaborate. Let's say we have a partitioned table, and I want to
create an index. The index may be either "global" i.e. creating a single
relation for data from all the partitions, or "local" (i.e. partitioned
the same way as the table).

Local indexes are easier to implement (it's essentially what we have now,
except that we need to create the indexes manually for each partition),
and don't work particularly well for some use cases (e.g. unique
constraints). This is what I mean by "partitioned indexes".

If the index is partitioned just like the table, we probably won't need to
copy the partition key info (so, nothing in pg_partitioned_table).
I'm not sure it makes sense to partition the index differently than the
table - I don't see a case where that would be useful.

The global indexes would work better for the unique constraint use case,
but it clearly contradicts our idea of TID (no information about which
partition that references).

So maybe the catalog really only needs to track info about tables? Not
sure. I'm just saying it'd be unfortunate to have _table in the name, and
end up using it for indexes too.


Hmm, I didn't quite think of the case where the index is partitioned
differently from the table, but perhaps that's possible with some other
databases.

I haven't thought about that very deeply either, so perhaps it's anentirely silly idea. Also, probably quite complex to implement I guess,so unlikely to be pursued soon.

What you describe as "local indexes" or "locally partitioned indexes" is
something I would like to see being pursued in the near term.  In that
case, we would allow defining indexes on the parent that are recursively
defined on the partitions and marked as inherited index, just like we have
inherited check constraints and NOT NULL constraints.  I have not studied
whether we could implement (globally) *unique* indexes with this scheme
though, wherein the index key is a superset of the partition key.

I think implementing UNIQUE constraint with local indexes is possibleand possibly even fairly simple, but it likely requires SHARE lock onall partitions, which is not particularly nice.

When the partition key is referenced in the constraint, that may allowlocking only a subset of the partitions, possibly even a single one. Butwith multi-level partitioning schemes that may be difficult.

Also, I don't think it's very likely to have the partitioning key aspart of the unique constraint. For example 'users' table is unlikely tobe distributed by 'login' and so on.

The global indexes make this easier, because there's just a singleindex to check. But of course, attaching/detaching partitions gets moreexpensive.

Anyway, starting a detailed discussion about local/global indexes wasnot really what I meant to do.

Clearly, this is a consequence of building the partitioning on top of
inheritance (not objecting to that approach, merely stating a fact).

I'm fine with whatever makes the error messages more consistent, if it
does not make the code significantly more complex. It's a bit confusing
when some use 'child tables' and others 'partitions'. I suspect even a
single DML command may return a mix of those, depending on where exactly
it fails (old vs. new code).


So, we have mostly some old DDL (CREATE/ALTER TABLE) and maintenance
commands that understand inheritance.  All of the their error messages
apply to partitions as well, wherein they will be referred to as "child
tables" using old terms.  We now have some cases where the commands cause
additional error messages for only partitions because of additional
restrictions that apply to them.  We use "partitions" for them because
they are essentially new error messages.

There won't be a case where single DML command would mix the two terms,
because we do not allow mixing partitioning and regular inheritance.
Maybe I misunderstood you though.

Don't we call inheritance-related functions from the new DDL? In thatcase we'd fail with 'child tables' error messages in the old code, and'partitions' in the new code. I'd be surprised if there was no such codereuse, but I haven't checked.

Am I right that one of the ambitions of the new partitioning is to improve
behavior with large number of partitions?


Yes.  Currently, SELECT planning is O(n) with significantly large constant
factor.  It is possible now to make it O(log n).  Also, if we can do away
with inheritance_planner() treatment for the *partitioned tables* in case
of UPDATE/DELETE, then that would be great.  That would mean their
planning time would be almost same as the SELECT case.

As you might know, we have volunteers to make this happen sooner [1], :)

Yes, I know. And it's great that you've managed to make the first step,getting all the infrastructure in, allowing others to build on that.Kudos to you!

At first I thought it's somewhat related to the FDW sharding (each node
being a partition and having local subpartitions), but I realize the
planner will only deal with the node partitions I guess.


Yeah, planner would only have the local partitioning metadata at its
disposal.  Foreign tables can only be leaf partitions, which if need to be
scanned for a given query, will be scanned using a ForeignScan.

Right, makes sense. Still, I can imagine for example having many dailypartitions and not having to merge them regularly just to reduce thenumber of partitions.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

Reply via email to