On Wed, Dec 19, 2018 at 10:51 AM David Rowley <david.row...@2ndquadrant.com> wrote: > > On Wed, 19 Dec 2018 at 20:40, Julien Rouhaud <rjuju...@gmail.com> wrote: > > > If I understand correctly, the new behavior is controlled by > > partitions_are_ordered(), but it only checks for declared partitions, > > not partitions that survived pruning. Did I miss something or is it > > the intended behavior? > > Yeah, it was mentioned up-thread a bit. > > I wrote: > > I retrospectively read that thread after Amit mentioned about your > > patch. I just disagree with Robert about caching this flag. The > > reason is, if the flag is false due to some problematic partitions, if > > we go and prune those, then we needlessly fail to optimise that case. > > I propose we come back and do the remaining optimisations with > > interleaved LIST partitions and partitioned tables with DEFAULT > > partitions later, once we have a new "live_parts" field in > > RelOptInfo. That way we can just check the live parts to ensure > > they're compatible with the optimization. If we get what's done > > already in then we're already a bit step forward.
Ah, sorry I did read this but I misunderstood it. I really need to catchup what changed for partitioning since pg11 more thoroughly. > The reason I'm keen to leave this alone today is that determining > which partitions are pruned requires looking at each partition's > RelOptInfo and checking if it's marked as a dummy rel. I'm trying to > minimise the overhead of this patch by avoiding doing any > per-partition processing. If we get the "live_parts" Bitmapset, then > this becomes cheaper as Bitmapsets are fairly efficient at finding the > next set member, even when they're large and sparsely populated. I see. But since for now the optimisation will only be done considering all partitions, I still think that it's better to store a bool flag in the PartitionDesc to describe if it's natively ordered or not, and therefore also handle the case for non-intervleaved-multi-datums list partitioning. It won't add much overhead and will benefit way more cases. We can still revisit that when a live_parts Bitmapset is available in RelOptInfo (and maybe other flag that say if partitions were pruned or not, and/or if the default partition was pruned). > > Also, generate_mergeappend_paths should > > probably be renamed to something like generate_sortedappend_paths > > since it can now generate either Append or MergeAppend paths. > > You might be right about naming this something else, but > "sortedappend" sounds like an Append node with a Sort node above it. > "orderedappend" feels slightly better, although my personal vote would > be not to rename it at all. Sometimes generating an Append seems like > an easy enough corner case to mention in the function body. Ok, I don't have a very strong opinion on it and orderedappend sounds less ambiguous. > > I'm also wondering if that's ok to only generate either a (sorted) > > Append or a MergeAppend. Is it possible that in some cases it's > > better to have a MergeAppend rather than a sorted Append, given that > > MergeAppend is parallel-aware and the sorted Append isn't? > > That might have been worth a thought if we had parallel MergeAppends, > but we don't. You might be thinking of GatherMerge. Ah, oups indeed :)