Hi folks, We have a materialized view from which a customer reported some confusing/invalid results, leading us to inspect the query and not finding anything wrong. Running the query defining the matview manually, or creating a new (identical) materialized view returns the correct result. Obviously, we've done REFRESH MATERIALIZED VIEW just before doing the comparison, and all runs are in the same schema.
It's a pretty big query, but let's describe the two matviews to see that they are identical. The first is the original returning invalid results, the one with _2 name postfix is the re-created one. mm_prod=> \d+ segments_with_contacts Materialized view "aakpnews.segments_with_contacts" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------------+-----------+----------+---------+----------+--------------+------------- lid | integer | | | | plain | | sid | integer | | | | plain | | email | public.citext | | | | extended | | Indexes: "segments_with_contacts_sid_lid_email_idx" UNIQUE, btree (sid, lid, email) View definition: WITH tagged_contacts AS ( SELECT cl.lid, cl.email, cl.skip_preexisting_campaigns AS skip_subscribed, ct.skip_preexisting_campaigns AS skip_tags, cl.ladded, ct.tagname, ct.created FROM contacts_lists cl LEFT JOIN contacts_tags ct USING (email) WHERE cl.lstatus::bpchar = 'a'::bpchar ), tagged_segments AS ( SELECT s.lid, s.cid, s.sid, sp.type, sp.mid, sp.matchdelay, sp.tagname, sp.event, count(*) OVER (PARTITION BY s.sid) AS requirements, campaigns.activated_at FROM segments s LEFT JOIN campaigns USING (cid) JOIN segments_predicates sp USING (sid) WHERE s.archived_at IS NULL AND (s.cid IS NULL OR campaigns.activated_at IS NOT NULL) ), segments_contacts AS ( SELECT s.lid, s.sid, s.requirements, CASE WHEN s.type = 'subscribed'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg FROM tagged_contacts WHERE tagged_contacts.lid = s.lid AND tagged_contacts.ladded >= s.activated_at AND (s.matchdelay IS NULL OR (tagged_contacts.ladded + s.matchdelay) < now()) AND NOT COALESCE(tagged_contacts.skip_subscribed, false)) WHEN s.type = 'has_tag'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg FROM tagged_contacts WHERE tagged_contacts.lid = s.lid AND tagged_contacts.tagname OPERATOR(public.=) s.tagname AND (s.matchdelay IS NULL OR (tagged_contacts.created + s.matchdelay) < now()) AND (s.cid IS NULL OR tagged_contacts.created >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_tags, false))) WHEN s.type = 'not_has_tag'::public.predicate THEN ( SELECT array_agg(aggregated_tags.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email, array_agg(tagged_contacts.tagname) AS tags FROM tagged_contacts WHERE tagged_contacts.lid = s.lid AND (s.cid IS NULL OR tagged_contacts.ladded >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_subscribed, false)) GROUP BY tagged_contacts.email) aggregated_tags WHERE NOT aggregated_tags.tags @> ARRAY[s.tagname]) WHEN s.type = 'received'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_sent.email::public.citext) AS array_agg FROM mails_contacts_sent WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())) WHEN s.type = 'not_received'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email FROM tagged_contacts WHERE tagged_contacts.lid = s.lid EXCEPT SELECT DISTINCT mails_contacts_sent.email FROM mails_contacts_sent WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())) x) WHEN s.type = 'opened'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_opens.email::public.citext) AS array_agg FROM mails_contacts_opens WHERE mails_contacts_opens.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_opens.opentime + s.matchdelay) < now())) WHEN s.type = 'not_opened'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email FROM tagged_contacts JOIN mails_contacts_sent USING (email) WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now()) EXCEPT SELECT DISTINCT mails_contacts_opens.email FROM mails_contacts_opens WHERE mails_contacts_opens.mid = s.mid) x) WHEN s.type = 'clicked'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_clicks.email::public.citext) AS array_agg FROM mails_contacts_clicks WHERE mails_contacts_clicks.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_clicks.clicktime + s.matchdelay) < now())) WHEN s.type = 'not_clicked'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email FROM tagged_contacts JOIN mails_contacts_sent USING (email) WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now()) EXCEPT SELECT DISTINCT mails_contacts_clicks.email FROM mails_contacts_clicks WHERE mails_contacts_clicks.mid = s.mid) x) WHEN s.type = 'event_triggered_first'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg FROM ( SELECT e.email FROM events e JOIN tagged_contacts USING (email) WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event GROUP BY e.email HAVING (s.matchdelay IS NULL OR (min(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR min(e.triggered_at) >= s.activated_at)) x) WHEN s.type = 'event_triggered_last'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg FROM ( SELECT e.email FROM events e JOIN tagged_contacts USING (email) WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event GROUP BY e.email HAVING (s.matchdelay IS NULL OR (max(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR max(e.triggered_at) >= s.activated_at)) x) ELSE NULL::public.citext[] END AS emails FROM tagged_segments s ), unnested AS ( SELECT segments_contacts.lid, segments_contacts.sid, segments_contacts.requirements, unnest(segments_contacts.emails) AS email FROM segments_contacts ) SELECT unnested.lid, unnested.sid, unnested.email FROM unnested GROUP BY unnested.lid, unnested.sid, unnested.email, unnested.requirements HAVING count(unnested.email) = unnested.requirements; Access method: heap mm_prod=> \d+ segments_with_contacts_2 Materialized view "aakpnews.segments_with_contacts_2" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------------+-----------+----------+---------+----------+--------------+------------- lid | integer | | | | plain | | sid | integer | | | | plain | | email | public.citext | | | | extended | | View definition: WITH tagged_contacts AS ( SELECT cl.lid, cl.email, cl.skip_preexisting_campaigns AS skip_subscribed, ct.skip_preexisting_campaigns AS skip_tags, cl.ladded, ct.tagname, ct.created FROM contacts_lists cl LEFT JOIN contacts_tags ct USING (email) WHERE cl.lstatus::bpchar = 'a'::bpchar ), tagged_segments AS ( SELECT s.lid, s.cid, s.sid, sp.type, sp.mid, sp.matchdelay, sp.tagname, sp.event, count(*) OVER (PARTITION BY s.sid) AS requirements, campaigns.activated_at FROM segments s LEFT JOIN campaigns USING (cid) JOIN segments_predicates sp USING (sid) WHERE s.archived_at IS NULL AND (s.cid IS NULL OR campaigns.activated_at IS NOT NULL) ), segments_contacts AS ( SELECT s.lid, s.sid, s.requirements, CASE WHEN s.type = 'subscribed'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg FROM tagged_contacts WHERE tagged_contacts.lid = s.lid AND tagged_contacts.ladded >= s.activated_at AND (s.matchdelay IS NULL OR (tagged_contacts.ladded + s.matchdelay) < now()) AND NOT COALESCE(tagged_contacts.skip_subscribed, false)) WHEN s.type = 'has_tag'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg FROM tagged_contacts WHERE tagged_contacts.lid = s.lid AND tagged_contacts.tagname OPERATOR(public.=) s.tagname AND (s.matchdelay IS NULL OR (tagged_contacts.created + s.matchdelay) < now()) AND (s.cid IS NULL OR tagged_contacts.created >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_tags, false))) WHEN s.type = 'not_has_tag'::public.predicate THEN ( SELECT array_agg(aggregated_tags.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email, array_agg(tagged_contacts.tagname) AS tags FROM tagged_contacts WHERE tagged_contacts.lid = s.lid AND (s.cid IS NULL OR tagged_contacts.ladded >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_subscribed, false)) GROUP BY tagged_contacts.email) aggregated_tags WHERE NOT aggregated_tags.tags @> ARRAY[s.tagname]) WHEN s.type = 'received'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_sent.email::public.citext) AS array_agg FROM mails_contacts_sent WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())) WHEN s.type = 'not_received'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email FROM tagged_contacts WHERE tagged_contacts.lid = s.lid EXCEPT SELECT DISTINCT mails_contacts_sent.email FROM mails_contacts_sent WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())) x) WHEN s.type = 'opened'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_opens.email::public.citext) AS array_agg FROM mails_contacts_opens WHERE mails_contacts_opens.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_opens.opentime + s.matchdelay) < now())) WHEN s.type = 'not_opened'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email FROM tagged_contacts JOIN mails_contacts_sent USING (email) WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now()) EXCEPT SELECT DISTINCT mails_contacts_opens.email FROM mails_contacts_opens WHERE mails_contacts_opens.mid = s.mid) x) WHEN s.type = 'clicked'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_clicks.email::public.citext) AS array_agg FROM mails_contacts_clicks WHERE mails_contacts_clicks.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_clicks.clicktime + s.matchdelay) < now())) WHEN s.type = 'not_clicked'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg FROM ( SELECT tagged_contacts.email FROM tagged_contacts JOIN mails_contacts_sent USING (email) WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now()) EXCEPT SELECT DISTINCT mails_contacts_clicks.email FROM mails_contacts_clicks WHERE mails_contacts_clicks.mid = s.mid) x) WHEN s.type = 'event_triggered_first'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg FROM ( SELECT e.email FROM events e JOIN tagged_contacts USING (email) WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event GROUP BY e.email HAVING (s.matchdelay IS NULL OR (min(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR min(e.triggered_at) >= s.activated_at)) x) WHEN s.type = 'event_triggered_last'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg FROM ( SELECT e.email FROM events e JOIN tagged_contacts USING (email) WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event GROUP BY e.email HAVING (s.matchdelay IS NULL OR (max(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR max(e.triggered_at) >= s.activated_at)) x) ELSE NULL::public.citext[] END AS emails FROM tagged_segments s ), unnested AS ( SELECT segments_contacts.lid, segments_contacts.sid, segments_contacts.requirements, unnest(segments_contacts.emails) AS email FROM segments_contacts ) SELECT unnested.lid, unnested.sid, unnested.email FROM unnested GROUP BY unnested.lid, unnested.sid, unnested.email, unnested.requirements HAVING count(unnested.email) = unnested.requirements; Access method: heap Here you can see the results are different: mm_prod=> begin; BEGIN mm_prod=> refresh materialized view segments_with_contacts; REFRESH MATERIALIZED VIEW mm_prod=> select count(*) from segments_with_contacts where sid = 42259; count ------- 91 (1 row) mm_prod=> refresh materialized view segments_with_contacts_2; REFRESH MATERIALIZED VIEW mm_prod=> select count(*) from segments_with_contacts_2 where sid = 42259; count ------- 98 (1 row) How can we figure out what's wrong here? Since it's a materialized view using EXPLAIN doesn't give me much to go on: mm_prod=> explain select count(*) from segments_with_contacts where sid = 42259; QUERY PLAN ------------------------------------------------------------------------------ Aggregate (cost=83.52..83.53 rows=1 width=8) -> Seq Scan on segments_with_contacts (cost=0.00..83.29 rows=91 width=0) Filter: (sid = 42259) (3 rows) mm_prod=> explain select count(*) from segments_with_contacts_2 where sid = 42259; QUERY PLAN --------------------------------------------------------------------------------- Aggregate (cost=87.46..87.47 rows=1 width=8) -> Seq Scan on segments_with_contacts_2 (cost=0.00..87.20 rows=105 width=0) Filter: (sid = 42259) (3 rows) mm_prod=> select count(*) from segments_with_contacts_2 where sid = 42259; count ------- 98 (1 row) Any insights into this, and how to debug this further, is much appreciated. Best, -- a.