On Mon, Jul 23, 2012 at 2:52 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Mon, Jul 23, 2012 at 4:12 PM, Kenneth Tilton <ktil...@mcna.net> wrote: > > My mental model of the EXISTS clause must be off. This snippet appears at > > the end of a series of WITH clauses I suspect are irrelevant: > > > >> with etc etc ... , cids as > >> (select distinct c.id from ddr2 c > >> join claim_entries ce on ce.claim_id = c.id > >> where (c.assigned_ddr = 879 > >> or exists (select 1 from ddr_cdt dc > >> where > >> dc.sys_user_id = 879 > >> and dc.document_type = c.document_type > >> -- makes it faster: and (dc.cdt_code is null or dc.cdt_code = > ce.cpt_code) > >> ))) > >> > >> select count(*) from cids > > > > > > If I uncomment the bit where it says "make it faster" I get decent > response > > and the graphical analyze display shows the expected user+doctype+cdtcode > > index is being used (and nice thin lines suggesting efficient lookup). > > > > As it is, the analyze display shows the expected user+doctype index* > being > > used but the lines are fat, and performance is an exponential disaster. > > > > * I created the (to me ) redundant user+doctype index trying to get > Postgres > > to Do the Right Thing(tm), but I can see that was not the issue. > > > > I presume the reason performance drops off a cliff is because there can > be > > 9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just > look > > to see if there was at least one row matching user+doctype and return its > > decision. I have tried select *, select 1, and limit 1 on the nested > select > > to no avail. > > > > Am I just doing something wrong? I am a relative noob. Is there some > other > > hint I can give the planner? > > hard to say without having the explain analyze output. also it's not > clear why you need to use WITH, at least for the terminating query. > I'd just do: > > select count(*) from > ( > inner_query > ) > OK. Here is the full query: with ddr as ( select c.id ,case when c.portal_user_id is null then u.provider_facility_id else pu.provider_facility_id end provider_facility_id from claims c left join sys_users u on u.id = c.created_by left join portal_users pu on pu.id = c.portal_user_id WHERE c.deleted = 0 AND c.status >= 0 AND (c.created_by is not null or c.portal_user_id is not null) AND true not in ( select ineligible_code_id in (46,65) from claim_carcs cc where c.id = cc.claim_id and cc.deleted = 0 ) AND (false OR c.document_type = 0) AND (false OR c.group_plan_id = 44) limit 1500 ) ,ddr2 as ( select c.id , c.document_type , c.radiographs , c.nea_number , c.assigned_ddr , d.provider_facility_id as submitting_facility_id , count(ca.id) as claim_attachments_count , cast(exists (select 1 from triples where s = c.id and sda='claims' and p = 'ddr-review-passed-on-by') as boolean) as passedon from ddr d inner join claims c on d.id = c.id join claim_attachments ca on c.id = ca.claim_id group by c.id , submitting_facility_id having ((nullif(trim(c.nea_number, ' '),'') is not null) or case transmission_method when 'P' then count(distinct ca.id) > 1 else count(distinct ca.id) > 0 end or c.radiographs > 0)) , cids as (select distinct c.id from ddr2 c join claim_entries ce on ce.claim_id = c.id where (c.assigned_ddr = 879 or exists (select 1 from ddr_cdt dc where dc.sys_user_id = 879 and dc.document_type = c.document_type --and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code) ))) select count(*) from cids And the explain output: "Aggregate (cost=56060.60..56060.61 rows=1 width=0)" " CTE ddr" " -> Limit (cost=306.29..16203.83 rows=1500 width=16)" " -> Nested Loop Left Join (cost=306.29..7442626.75 rows=702214 width=16)" " -> Hash Left Join (cost=306.29..7244556.97 rows=702214 width=12)" " Hash Cond: (c.created_by = u.id)" " -> Index Scan using claims_lca1 on claims c (cost=0.00..7230212.96 rows=702214 width=12)" " Index Cond: ((deleted = 0) AND (status >= 0) AND (group_plan_id = 44) AND (document_type = 0))" " Filter: (((created_by IS NOT NULL) OR (portal_user_id IS NOT NULL)) AND (NOT (SubPlan 1)))" " SubPlan 1" " -> Index Scan using claim_carcs_claim_id on claim_carcs cc (cost=0.00..9.23 rows=1 width=4)" " Index Cond: (c.id = claim_id)" " Filter: (deleted = 0)" " -> Hash (cost=224.46..224.46 rows=6546 width=8)" " -> Seq Scan on sys_users u (cost=0.00..224.46 rows=6546 width=8)" " -> Index Scan using portal_users_pkey on portal_users pu (cost=0.00..0.27 rows=1 width=8)" " Index Cond: (id = c.portal_user_id)" " CTE ddr2" " -> GroupAggregate (cost=25714.40..28093.98 rows=286 width=27)" " Filter: ((NULLIF(btrim((c.nea_number)::text, ' '::text), ''::text) IS NOT NULL) OR CASE c.transmission_method WHEN 'P'::bpchar THEN (count(DISTINCT ca.id) > 1) ELSE (count(DISTINCT ca.id) > 0) END OR (c.radiographs > 0))" " -> Sort (cost=25714.40..25715.11 rows=286 width=27)" " Sort Key: c.id, d.provider_facility_id" " -> Nested Loop (cost=0.00..25702.73 rows=286 width=27)" " -> Nested Loop (cost=0.00..12752.74 rows=1500 width=27)" " -> CTE Scan on ddr d (cost=0.00..30.00 rows=1500 width=8)" " -> Index Scan using claims_pkey on claims c (cost=0.00..8.47 rows=1 width=19)" " Index Cond: (id = d.id)" " -> Index Scan using claim_attachments_claim on claim_attachments ca (cost=0.00..8.61 rows=2 width=8)" " Index Cond: (claim_id = c.id)" " SubPlan 3" " -> Index Scan using triples_s_idx on triples (cost=0.00..8.28 rows=1 width=0)" " Index Cond: (s = c.id)" " Filter: ((sda = 'claims'::text) AND (p = 'ddr-review-passed-on-by'::text))" " SubPlan 4" " -> Bitmap Heap Scan on triples (cost=102.70..1010.15 rows=823 width=8)" " Recheck Cond: (p = 'ddr-review-passed-on-by'::text)" " Filter: (sda = 'claims'::text)" " -> Bitmap Index Scan on triples_p_idx (cost=0.00..102.49 rows=3497 width=0)" " Index Cond: (p = 'ddr-review-passed-on-by'::text)" " CTE cids" " -> HashAggregate (cost=11759.51..11760.52 rows=101 width=4)" " -> Nested Loop (cost=0.00..11722.94 rows=14627 width=4)" " -> CTE Scan on ddr2 c (cost=0.00..112.75 rows=144 width=4)" " Filter: ((assigned_ddr = 879) OR (alternatives: SubPlan 6 or hashed SubPlan 7))" " SubPlan 6" " -> Seq Scan on ddr_cdt dc (cost=0.00..134293.58 rows=361282 width=0)" " Filter: ((sys_user_id = 879) AND (document_type = c.document_type))" " SubPlan 7" " -> Bitmap Heap Scan on ddr_cdt dc (cost=20292.74..73868.80 rows=1083845 width=4)" " Recheck Cond: (sys_user_id = 879)" " -> Bitmap Index Scan on "ddr-cdt-idx-user-doc" (cost=0.00..20021.78 rows=1083845 width=0)" " Index Cond: (sys_user_id = 879)" " -> Index Scan using claim_entries_claim_id on claim_entries ce (cost=0.00..79.35 rows=102 width=4)" " Index Cond: (claim_id = c.id)" " -> CTE Scan on cids (cost=0.00..2.02 rows=101 width=0)" More interesting: I tried reducing the complex query to a simpler query and what I saw was that my mental model of EXISTS is fine. :) It was efficient in the way I expected, and faster than the version that did the last test (the cdt_code test). Now I just have to find out why it is slower in vivo. Thx, ken -- Kenneth Tilton *Director of Software Development* *MCNA Dental Plans* 200 West Cypress Creek Road Suite 500 Fort Lauderdale, FL 33309 954-730-7131 X181 (Office) 954-628-3347 (Fax) 1-800-494-6262 X181 (Toll Free) ktil...@mcna.net <glip...@mcna.net> (Email) www.mcna.net (Website) CONFIDENTIALITY NOTICE: This electronic mail may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic mail or its contents by persons other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail so that we may correct our internal records. Please then delete the original message. Thank you.