Re: range_agg

Paul A Jungwirth Fri, 05 Jul 2019 10:23:26 -0700

On Fri, Jul 5, 2019 at 4:31 AM Pavel Stehule <pavel.steh...@gmail.com> wrote:
> The first issue is unstable regress tests - there is a problem with opr_sanity


I would prefer to avoid needing to add anything to opr_sanity really.
A multirange would let me achieve that I think. But otherwise I'll add
the ordering. Thanks!

> +   rangeTypeId = get_fn_expr_argtype(fcinfo->flinfo, 1);
> +   if (!type_is_range(rangeTypeId))
> +   {
> +       ereport(ERROR, (errmsg("range_agg must be called with a range")));
> +   }

I think this was left over from my attempts to support user-defined
ranges. I believe I can remove it now.

> +# Making 2- and 3-param range_agg polymorphic is difficult
> +# because it would take an anyrange and return an anyrange[],
> +# which doesn't exist.
> +# As a workaround we define separate functions for each built-in range type.
> +# This is what causes the mess in src/test/regress/expected/opr_sanity.out.
> +{ oid => '8003', descr => 'aggregate transition function',
>
> This is strange. Does it means so range_agg will not work with custom range 
> types?

You would have to define your own range_agg with your own custom type
in the signature. I gave instructions about that in my range_agg
extension (https://github.com/pjungwir/range_agg), but it's not as
easy with range_agg in core because I don't think you can define a
custom function that is backed by a built-in C function. At least I
couldn't get that to work.

The biggest argument for a multirange to me is that we could have
anymultirange, with similar rules as anyarray. That way I could have
`range_agg(r anyrange) RETURNS anymultirange`. [1] is a conversation
where I asked about doing this before multiranges were suggested. Also
I'm aware of your own recent work on polymorphic types at [2] but I
haven't read it closely enough to see if it would help me here. Do you
think it applies?

> I am not sure about implementation. I think so accumulating all ranges, 
> sorting and processing on the end can be memory and CPU expensive.

I did some research and couldn't find any algorithm that was better
than O(n log n), but if you're aware of any I'd like to know about it.
Assuming we can't improve on the complexity bounds, I think a sort +
iteration is desirable because it keeps things simple and benefits
from past & future work on the sorting code. I care more about
optimizing time here than RAM, especially since we'll use this in FK
checks, where the inputs will rarely be more than a few and probably
never in the millions.

We especially want to avoid an O(n^2) algorithm, which would be easy
to stumble into if we naively accumulated the result as we go. For
example if we had multiranges you could imagine an implementation that
just did `result + r` for all inputs r. But that would have the same
n^2 pattern as looping over strcat.

We could use a tree to keep things sorted and accumulate as we go, but
the implementation would be more complicated and I think slower.

Thanks for the review!

Paul

[1] 
https://www.postgresql.org/message-id/CA%2BrenyVOjb4xQZGjdCnA54-1nzVSY%2B47-h4nkM-EP5J%3D1z%3Db9w%40mail.gmail.com

[2] 
https://www.postgresql.org/message-id/cafj8prdna7vqni8gr+tt2ktmz0cq5g93guc3sbn_nvpldxa...@mail.gmail.com

Re: range_agg

Reply via email to