Re: count(*) vs count(id)

Hellmuth Vargas Tue, 02 Feb 2021 12:45:53 -0800

Hello list

My English is not very good, so I pretend that through the examples you
understand what I intend to expose


-- Recreate the query that is supposedly wrong

select calendar.entry, count(*)
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
group by calendar.entry;

-- wrong???

         entry          | count
------------------------+-------
 2020-08-20 00:00:00-05 |     1
 2020-08-21 00:00:00-05 |     1
 2020-08-22 00:00:00-05 |     1
 2020-08-23 00:00:00-05 |     1
 2020-08-24 00:00:00-05 |     1
 2020-08-25 00:00:00-05 |     1
 2020-08-26 00:00:00-05 |     1
 2020-08-27 00:00:00-05 |     1
 2020-08-28 00:00:00-05 |     2
 2020-08-29 00:00:00-05 |     1
 2020-08-30 00:00:00-05 |     1
 2020-08-31 00:00:00-05 |     1
 2020-09-01 00:00:00-05 |     1
 2020-09-02 00:00:00-05 |     2
 2020-09-03 00:00:00-05 |     1
 2020-09-04 00:00:00-05 |     2
 2020-09-05 00:00:00-05 |     1
 2020-09-06 00:00:00-05 |     1
 2020-09-07 00:00:00-05 |     1
 2020-09-08 00:00:00-05 |     1
 2020-09-09 00:00:00-05 |     1
 2020-09-10 00:00:00-05 |     1
 2020-09-11 00:00:00-05 |     1
 2020-09-12 00:00:00-05 |     1
 2020-09-13 00:00:00-05 |     1
 2020-09-14 00:00:00-05 |     1
 2020-09-15 00:00:00-05 |     1

-- In the count I will only consider the records of call_records

select calendar.entry, count(call_records.*)
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
group by calendar.entry;


--- perfect
         entry          | count
------------------------+-------
 2020-08-20 00:00:00-05 |     0
 2020-08-21 00:00:00-05 |     0
 2020-08-22 00:00:00-05 |     0
 2020-08-23 00:00:00-05 |     0
 2020-08-24 00:00:00-05 |     0
 2020-08-25 00:00:00-05 |     0
 2020-08-26 00:00:00-05 |     0
 2020-08-27 00:00:00-05 |     0
 2020-08-28 00:00:00-05 |     2
 2020-08-29 00:00:00-05 |     1
 2020-08-30 00:00:00-05 |     0
 2020-08-31 00:00:00-05 |     0
 2020-09-01 00:00:00-05 |     1
 2020-09-02 00:00:00-05 |     2
 2020-09-03 00:00:00-05 |     1
 2020-09-04 00:00:00-05 |     2
 2020-09-05 00:00:00-05 |     0
 2020-09-06 00:00:00-05 |     0
 2020-09-07 00:00:00-05 |     0
 2020-09-08 00:00:00-05 |     0
 2020-09-09 00:00:00-05 |     0
 2020-09-10 00:00:00-05 |     0
 2020-09-11 00:00:00-05 |     0
 2020-09-12 00:00:00-05 |     0
 2020-09-13 00:00:00-05 |     0
 2020-09-14 00:00:00-05 |     0
 2020-09-15 00:00:00-05 |     0

when placing * I want to bring all the join records between both tables and
when counting them of course there will be a row for the dates   2020-08-30
,  2020-08-31 so the call_records fields are null

select *
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date


         entry          | id |      timestamp
------------------------+----+---------------------
 2020-08-20 00:00:00-05 |    |
 2020-08-21 00:00:00-05 |    |
 2020-08-22 00:00:00-05 |    |
 2020-08-23 00:00:00-05 |    |
 2020-08-24 00:00:00-05 |    |
 2020-08-25 00:00:00-05 |    |
 2020-08-26 00:00:00-05 |    |
 2020-08-27 00:00:00-05 |    |
 2020-08-28 00:00:00-05 |  1 | 2020-08-28 09:44:11
 2020-08-28 00:00:00-05 |  2 | 2020-08-28 10:44:11
 2020-08-29 00:00:00-05 |  3 | 2020-08-29 11:44:11
 2020-08-30 00:00:00-05 |    |
 2020-08-31 00:00:00-05 |    |
 2020-09-01 00:00:00-05 |  4 | 2020-09-01 02:44:11
 2020-09-02 00:00:00-05 |  5 | 2020-09-02 03:44:11
 2020-09-02 00:00:00-05 |  6 | 2020-09-02 04:44:11
 2020-09-03 00:00:00-05 |  7 | 2020-09-03 05:44:11
 2020-09-04 00:00:00-05 |  8 | 2020-09-04 06:44:11
 2020-09-04 00:00:00-05 | 10 | 2020-09-04 07:44:11
 2020-09-05 00:00:00-05 |    |
 2020-09-06 00:00:00-05 |    |
 2020-09-07 00:00:00-05 |    |
 2020-09-08 00:00:00-05 |    |
 2020-09-09 00:00:00-05 |    |
 2020-09-10 00:00:00-05 |    |
 2020-09-11 00:00:00-05 |    |
 2020-09-12 00:00:00-05 |    |
 2020-09-13 00:00:00-05 |    |
 2020-09-14 00:00:00-05 |    |
 2020-09-15 00:00:00-05 |    |

--- counting

select entry, count(*)
from (
select *
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1
day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
) as u
group by entry

         entry          | count
------------------------+-------
 2020-08-20 00:00:00-05 |     1
 2020-08-21 00:00:00-05 |     1
 2020-08-22 00:00:00-05 |     1
 2020-08-23 00:00:00-05 |     1
 2020-08-24 00:00:00-05 |     1
 2020-08-25 00:00:00-05 |     1
 2020-08-26 00:00:00-05 |     1
 2020-08-27 00:00:00-05 |     1
 2020-08-28 00:00:00-05 |     2
 2020-08-29 00:00:00-05 |     1
 2020-08-30 00:00:00-05 |     1
 2020-08-31 00:00:00-05 |     1
 2020-09-01 00:00:00-05 |     1
 2020-09-02 00:00:00-05 |     2
 2020-09-03 00:00:00-05 |     1
 2020-09-04 00:00:00-05 |     2
 2020-09-05 00:00:00-05 |     1
 2020-09-06 00:00:00-05 |     1
 2020-09-07 00:00:00-05 |     1
 2020-09-08 00:00:00-05 |     1
 2020-09-09 00:00:00-05 |     1
 2020-09-10 00:00:00-05 |     1
 2020-09-11 00:00:00-05 |     1
 2020-09-12 00:00:00-05 |     1
 2020-09-13 00:00:00-05 |     1
 2020-09-14 00:00:00-05 |     1
 2020-09-15 00:00:00-05 |     1





El mar, 2 de feb. de 2021 a la(s) 03:31, Laurenz Albe (
laurenz.a...@cybertec.at) escribió:

> On Mon, 2021-02-01 at 19:14 -0600, Matt Zagrabelny wrote:
> > > > What is count(*) counting then? I thought it was rows.
> > >
> > > Yeah, but count(id) only counts rows where id isn't null.
> >
> > I guess I'm still not understanding it...
> >
> > I don't have any rows where id is null:
>
> Then the *result* of count(*) and count(id) will be the same.
>
> The asterisk in count(*) is misleading.  Different from any other
> programming language that I know, the SQL standard has decided that
> you cannot have an aggregate function without arguments.  You have
> to use the asterisk in that case.
>
> So count(*) really is count(), that is, it counts one for every
> row that it finds, no matter what the row contains.
>
> But count(id) includes a check: if "id IS NULL", it is not counted.
>
> If that condition is satisfied for all "id"s, you end up with
> the same count.  But count(id) is more expensive, because it
> will perform this unnecessary NULLness check for each row.
>
> In short: use count(*) if you want to count rows, and use
> count(x) if you want to count all rows where x IS NOT NULL.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>
>
>

-- 
Cordialmente,

Ing. Hellmuth I. Vargas S.

Re: count(*) vs count(id)

Reply via email to