Re: Ordered Partitioned Table Scans

Julien Rouhaud Wed, 19 Dec 2018 02:26:50 -0800

On Wed, Dec 19, 2018 at 10:51 AM David Rowley
<david.row...@2ndquadrant.com> wrote:
>
> On Wed, 19 Dec 2018 at 20:40, Julien Rouhaud <rjuju...@gmail.com> wrote:
>
> > If I understand correctly, the new behavior is controlled by
> > partitions_are_ordered(), but it only checks for declared partitions,
> > not partitions that survived pruning.  Did I miss something or is it
> > the intended behavior?
>
> Yeah, it was mentioned up-thread a bit.
>
> I wrote:
> > I retrospectively read that thread after Amit mentioned about your
> > patch. I just disagree with Robert about caching this flag.  The
> > reason is, if the flag is false due to some problematic partitions, if
> > we go and prune those, then we needlessly fail to optimise that case.
> > I propose we come back and do the remaining optimisations with
> > interleaved LIST partitions and partitioned tables with DEFAULT
> > partitions later,  once we have a new "live_parts" field in
> > RelOptInfo.  That way we can just check the live parts to ensure
> > they're compatible with the optimization. If we get what's done
> > already in then we're already a bit step forward.


Ah, sorry I did read this but I misunderstood it.  I really need to
catchup what changed for partitioning since pg11 more thoroughly.

> The reason I'm keen to leave this alone today is that determining
> which partitions are pruned requires looking at each partition's
> RelOptInfo and checking if it's marked as a dummy rel. I'm trying to
> minimise the overhead of this patch by avoiding doing any
> per-partition processing. If we get the "live_parts" Bitmapset, then
> this becomes cheaper as Bitmapsets are fairly efficient at finding the
> next set member, even when they're large and sparsely populated.

I see.  But since for now the optimisation will only be done
considering all partitions, I still think that it's better to store a
bool flag in the PartitionDesc to describe if it's natively ordered or
not, and therefore also handle the case for
non-intervleaved-multi-datums list partitioning.  It won't add much
overhead and will benefit way more cases.

We can still revisit that when a live_parts Bitmapset is available in
RelOptInfo (and maybe other flag that say if partitions were pruned or
not, and/or if the default partition was pruned).

> > Also, generate_mergeappend_paths should
> > probably be renamed to something like generate_sortedappend_paths
> > since it can now generate either Append or MergeAppend paths.
>
> You might be right about naming this something else, but
> "sortedappend" sounds like an Append node with a Sort node above it.
> "orderedappend" feels slightly better, although my personal vote would
> be not to rename it at all. Sometimes generating an Append seems like
> an easy enough corner case to mention in the function body.

Ok, I don't have a very strong opinion on it and orderedappend sounds
less ambiguous.

> > I'm also wondering if that's ok to only generate either a (sorted)
> > Append or a MergeAppend.  Is it possible that in some cases it's
> > better to have a MergeAppend rather than a sorted Append, given that
> > MergeAppend is parallel-aware and the sorted Append isn't?
>
> That might have been worth a thought if we had parallel MergeAppends,
> but we don't. You might be thinking of GatherMerge.

Ah, oups indeed :)

Re: Ordered Partitioned Table Scans

Reply via email to