[SQL] Aggregates (last/first) not behaving

2010-07-21 Thread Wes Devauld
I have PostgreSQL 8.3.9 [PostgreSQL 8.3.9 on i386-apple-darwin10.3.0,
compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build
5646) (dot 1)]

and the custom first and last aggregates from:

http://wiki.postgresql.org/wiki/First_(aggregate)
http://wiki.postgresql.org/wiki/Last_(aggregate)

I have a simple table, of two columns.  The first is a timestamp and is the
primary key, the second is an integer.  I've loaded the table up with
values, one for every minute, for a whole year.  Some SQL to recreate the
table and the aggregates can be retrieved from:

http://blog.devauld.ca/aggregate_test.zip (File is approximately 180KB)

Now when I try to make use of the first and last aggregates, I get:

# select first(t), last(t) from test group by extract(day from t);
first|last
-+-
 2009-01-01 00:00:00 | 2009-01-01 17:02:00
 2009-01-02 10:07:00 | 2009-01-02 10:06:00
 2009-01-03 20:15:00 | 2009-01-03 20:14:00
 2009-01-04 00:00:00 | 2009-01-04 23:59:00
 2009-01-05 00:00:00 | 2009-01-05 23:59:00
 2009-01-06 16:31:00 | 2009-01-06 16:30:00
 2009-01-07 00:00:00 | 2009-01-07 23:49:00
 2009-01-08 11:09:00 | 2009-01-08 11:42:00
 2009-01-09 11:08:00 | 2009-01-09 00:51:00
 2009-01-10 11:33:00 | 2009-01-10 23:37:00
 2009-01-11 13:05:00 | 2009-01-11 23:59:00
 2009-01-12 23:55:00 | 2009-01-12 23:47:00
 2009-01-13 01:50:00 | 2009-01-13 23:36:00
 2009-01-14 23:55:00 | 2009-01-14 23:41:00
 2009-01-15 00:47:00 | 2009-01-15 23:40:00
 2009-01-16 00:29:00 | 2009-01-16 23:38:00
 2009-01-17 00:09:00 | 2009-01-17 23:37:00
 2009-01-18 23:48:00 | 2009-01-18 23:37:00
 2009-01-19 23:56:00 | 2009-01-19 23:39:00
 2009-01-20 07:14:00 | 2009-01-20 23:36:00
 2009-01-21 23:40:00 | 2009-01-21 23:41:00
 2009-01-22 02:57:00 | 2009-01-22 23:40:00
 2009-01-23 23:56:00 | 2009-01-23 23:38:00
 2009-01-24 09:34:00 | 2009-01-24 23:37:00
 2009-01-25 23:50:00 | 2009-01-25 23:37:00
 2009-01-26 23:48:00 | 2009-01-26 23:39:00
 2009-01-27 06:36:00 | 2009-01-27 23:37:00
 2009-01-28 23:59:00 | 2009-01-28 23:41:00
 2009-01-29 16:12:00 | 2009-01-29 23:40:00
 2009-01-30 21:11:00 | 2009-01-30 23:39:00
 2009-01-31 20:12:00 | 2009-01-31 16:20:00
(31 rows)

For some reason the aggregates are not falling into the proper group.  I
can't blame timezones as the results are all over the map, and first/last
relationship is broken as in some cases 'last' is chronologically before
'first'

If I explicitly retrieve the values for midnight each day:

# select t, v from test where extract(hour from t) = 0 and extract(minute
from t) = 0;
  t  |   v
-+---
 2009-01-01 00:00:00 | 0
 2009-01-02 00:00:00 |  1440
 2009-01-03 00:00:00 |  2880
 2009-01-04 00:00:00 |  4320
 2009-01-05 00:00:00 |  5760
 2009-01-06 00:00:00 |  7200
 2009-01-07 00:00:00 |  8640
 2009-01-08 00:00:00 | 10080
 2009-01-09 00:00:00 | 11520
 2009-01-10 00:00:00 | 12960
 2009-01-11 00:00:00 | 14400
 2009-01-12 00:00:00 | 15840
 2009-01-13 00:00:00 | 17280
 2009-01-14 00:00:00 | 18720
 2009-01-15 00:00:00 | 20160
 2009-01-16 00:00:00 | 21600
 2009-01-17 00:00:00 | 23040
 2009-01-18 00:00:00 | 24480
 2009-01-19 00:00:00 | 25920
 2009-01-20 00:00:00 | 27360
 2009-01-21 00:00:00 | 28800
 2009-01-22 00:00:00 | 30240
 2009-01-23 00:00:00 | 31680
 2009-01-24 00:00:00 | 33120
 2009-01-25 00:00:00 | 34560
 2009-01-26 00:00:00 | 36000
 2009-01-27 00:00:00 | 37440
 2009-01-28 00:00:00 | 38880
 2009-01-29 00:00:00 | 40320
 2009-01-30 00:00:00 | 41760
 2009-01-31 00:00:00 | 43200
(31 rows)

I get back the values for which I am seeking.  The pain is in finding the
last record in the day before.  I would have thought that grouping by
date_trunc on month would have yeilded similar results to above:

# select first(t), first(v) from test group by date_trunc('day', t);
first| first
-+---
 2009-01-01 00:00:00 | 0
 2009-01-02 10:07:00 |  2047
 2009-01-03 20:15:00 |  4095
 2009-01-04 00:00:00 |  4320
 2009-01-05 00:00:00 |  5760
 2009-01-06 17:33:00 |  8253
 2009-01-07 16:56:00 |  9656
 2009-01-08 17:28:00 | 11128
 2009-01-09 21:14:00 | 12794
 2009-01-10 05:47:00 | 13307
 2009-01-11 16:42:00 | 15402
 2009-01-12 16:30:00 | 16830
 2009-01-13 20:14:00 | 18494
 2009-01-14 23:59:00 | 20159
 2009-01-15 22:17:00 | 21497
 2009-01-16 23:57:00 | 23037
 2009-01-17 18:32:00 | 24152
 2009-01-18 20:15:00 | 25695
 2009-01-19 07:58:00 | 26398
 2009-01-20 22:16:00 | 28696
 2009-01-21 17:31:00 | 29851
 2009-01-22 16:37:00 | 31237
 2009-01-23 23:59:00 | 33119
 2009-01-24 21:13:00 | 34393
 2009-01-25 22:17:00 | 35897
 2009-01-26 16:42:00 | 37002
 2009-01-27 16:30:00 | 38430
 2009-01-28 16:52:00 | 39892
 2009-01-29 23:59:00 | 41759
 2009-01-30 10:19:00 | 42379
 2009-01-31 14:58:00 | 44098
(31 rows)

Looking at the plan:

# explain select first(t), first(v) from test group by date_trunc('day', t);
  QUERY PLAN
-

Re: [SQL] Aggregates (last/first) not behaving

2010-07-22 Thread Wes Devauld
I believe I lost the flavour of what I'm doing when I constructed this
example.  I'm not interested in the timepoint as much as the value that is
attached to it.  I need to be able to find the last chronological record for
a given day.

I can get the value for which I am looking in two steps:

select max(t) as t into table last_of_day from test group by extract(day
from t);
select last_of_day.t, test.v from last_of_day, test where last_of_day.t =
test.t;

I was fairly happy when first() and last() were discovered, as these two
steps could be merged, until the table grows too large and the query planner
decides to sort the results before they are aggregated.

I was searching for a way to keep using last() and keeping the extraction to
a single step, although the more I fight with it, the less I think that it
is worth it.  If you have any further suggestions, I would appreciate
hearing them.

-W

On Thu, Jul 22, 2010 at 2:44 AM, Richard Huxton  wrote:
>
>
> They all seem grouped properly (by day) to me. Unless I've missed
> something.
>
> The first/last aggregates aren't ordered in any way. They are "first value
> I happened to find" and "last value I happened to find".
>
> If you want the earliest/latest timestamp from each day, use min() and
> max().
>
> --
>  Richard Huxton
>  Archonet Ltd
>