Thanks David. Similar to your "TOP() and BOTTOM() aggregate" idea, you might find Erwin Brandstetter's solution using the LAST() aggregate function interesting: ( https://dba.stackexchange.com/a/324646/100880)
If the FIRST_LAST_AGG extension is installed, then we can do something like this: SELECT country , count(*) AS ct_cities , max(population) AS highest_population , last(city ORDER BY population, city) AS biggest_city -- !FROM citiesGROUP BY countryHAVING count(*) > 1; -Ben On Mon, Mar 6, 2023 at 9:51 PM David Rowley <dgrowle...@gmail.com> wrote: > On Tue, 7 Mar 2023 at 12:40, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > Ben Clements <benhasgonewalk...@gmail.com> writes: > > > As shown above, the following calculated column can bring in the city > name, > > > even though the city name isn't in the GROUP BY: > > > max(city) keep (dense_rank first order by population desc) > > > > You haven't really explained what this does, let alone why it can't > > be implemented with existing features such as FILTER and ORDER BY. > > (It wasn't clear to me until I watched the youtube video.) > > Likely KEEP is more flexible than just the given example but I think > that something similar to the example given could be done by inventing > a TOP() and BOTTOM() aggregate. Then you could write something like: > > select > country, > count(*), > max(population), > bottom(city, population) > from > cities > group by > country > having > count(*) > 1 > > the transfn for bottom() would need to remember the city and the > population for the highest yet seen value of the 2nd arg. The > combinefn would need to find the aggregate state with the highest 2nd > arg value, the finalfn would just spit out the column that's stored in > the state. Where this wouldn't work would be if multiple columns were > required to tiebreak the sort. > > You could at least parallelize the aggregation this way. If there were > to be some form of ORDER BY in the aggregate then no parallelization > would be possible. I'd assume since the whole thing can be done with > a subquery that the entire point of having special syntax for this > would be because we don't want to pay the price of looking at the > table twice, i.e. performance must matter, so the ability to have > parallel aggregates here seems good. > > I can't quite think of a way to have parallel query and an arbitrarily > long list of columns to sort on... > > For Ben, we do tend to shy away from copying other RDBMS's extensions > to the SQL language. The problem is that copying these can cause > problems in the future if/when the standard adopts that syntax with > variations or invents something else that conflicts with the grammar > that we've added. One example of something we didn't do was Oracle's > CONNECT BY. Eventually, the SQL standard got WITH RECURSIVE to allow > queries on hierarchical data. Of course, we do have many of our own > extensions to the standard, so we certainly do make exceptions > sometimes. So, don't be too surprised that there's some discussion of > other methods which might make this work which don't involve copying > what someone else has done. > > David >