The spec defines two types of aggregate function classed as "ordered set function", as follows: 1. An "inverse distribution function" taking one argument (which must be a grouped column or otherwise constant within groups) plus a sorted group with exactly one column: =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... The motivating example for this (and the only ones in the spec) are percentile_cont and percentile_disc, to return a percentile result from a continuous or discrete distribution. (Thus percentile_cont(0.5) within group (order by x) is the spec's version of a median(x) function.) 2. A "hypothetical set function" taking N arguments of arbitrary types (a la VARIADIC "any", rather than a fixed list) plus a sorted group with N columns of matching types: =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ... (where typeof(p1)==typeof(q1) and so on, at least up to trivial conversions) The motivating example here is to be able to do rank(p1,p2,...) to return the rank that the specified values would have had if they were added to the group. As usual, we do not want to constrain ourselves to supporting only the specific cases in the spec, but would prefer a general solution. We (meaning myself and Atri) have an implementation that basically works, though it is not yet complete, but before taking it any further we need to resolve the design question of how to represent these two types of function in the system catalogs. The fact that there are in effect two parts to the parameter list, which are either independent (for inverse distribution funcs) or closely related (for hypothetical set functions), doesn't seem to point to an obvious way to represent this in pg_proc/pg_aggregate. I'm not yet satisfied with the method used in our implementation, so we're throwing this open for suggestions. We will post the work-in-progress patch along with a description of its current implementation shortly. One of the major complications is that we ideally want to be able to do polymorphism based on the type of the sorted group, specifically in order to be able to do percentile_disc(float8) within group (order by anyelement) returning anyelement. (i.e. we should be able to get a discrete percentile from any type that is orderable.) The question here is how to resolve the return type both of the aggregate itself and of the finalfn. We've also had an expression of interest in extending this to allow percentile_disc(float8[]) and percentile_cont(float8[]) returning arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return an array containing the bounds, median and quartiles in one go. This is an extension to the spec but it seems sufficiently obviously useful to be worth supporting.
Comments? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers