Thanks Merlin, Well... sorry, It could be and my bad english... but let me explain chronologicaly things...
I have first written concrete case... http://postgresql.1045698.n5.nabble.com/PostgreSQL-planner-tp5749427.html But because of I recognized the pattern - always is problem with JOIN to a view... I have written this abroad generic question.... Because of, I think, Postgres have problem with JOIN to a view in general...So probably someone before me have had the same problem - and if that is the case I just wanted to hear thier solution... But from others examples, and some tests EXPLAIN ANALYZE I have done... i.e. SELECT t1.a FROM t1 LEFTJOIN t2 USING (a) Planer includes some actions related to t2 - what are not necessary at all... again - it is just my opinion :) (Please, don't take this - I don't know... as some most important thing...) So that are "small" problems - on our simplified examples - what have big impact in performance on a bit complex examples... So what we have indentified until know - solution to our problem with views - is always: "rephrase the question" (not indexes - they exist - just not used...) for example: SELECT view.* FROM view INNER JOIN t1 USING (col1) WHERE t1.col2 = 1 to get better performance, you need to say: SELECT view.* FROM view WHERE col1 = (SELECT t.col1 FROM t1 WHERE t1.col2 = 1) Logically - that are the same questions - result is the same.... just from some unknown reason to me - postgres in first case picks wrong plan - and we got very bad performance... :( So solution to our problem - is to add "rephrase the question" tier... (analyze what "input question" is - and transform it to better SQL for Postgres" And fortunately we have that flexibility in our app... And how the things are - we will need to do it,,,. So, if input question is: SELECT t1.a FROM t1 LEFT JOIN t2 USING (a) - transform it to: SELECT t1.a FROM t1 etc... But, don't you think that would be better for Postgres planner in general? Nowhere in our examples are cases like where A || B = 'x' ... or WHERE volatile_function(a, b) = 5... etc... Materalisation - well that is another thing why we use VIEWS.... So for calculated properties of the things - we use SQL stable functions... i.e the thing: Customer - is the VIEW in postgres: cust_id, cust_name, blablabla...columns, customer_balance customer_balance - is calculated property of Entity: Customer - postgres function actually... if custmer_balabce sucks in perform - involve mat view - and the function will return value from mat_view - instead of to do calculation... But again - we dont have the problem with that :) We have the problem with: SELECT c.* FROM customers_view c INNER JOIN invoices USING (customer_id) WHERE invoice_id = 156 And solution to our problem is: "rephrase the question" :) Kind Regards, Misa 2013/3/25 Merlin Moncure <mmonc...@gmail.com> > On Sat, Mar 23, 2013 at 9:25 PM, Misa Simic <misa.si...@gmail.com> wrote: > > HI, > > > > When I have met PostgreSQL for a first time - I have been really amazed - > > with many things... > > > > But how we started to use it - and data jumps in - we meet performance > > problems... > > > > Now, it is a bit tricky... any concrete performance problem - can be > solved > > on some way... > > > > However, I am more concerned, that "our way" how we do the things - is > not > > best way for Postgres... > > > > The thing is - our tables are highly normalised - so lot of joins... > (NULLS > > do not exist in the tables) > > > > to make things simplier we use VIEWS to "denormalise" data - though it is > > not that much we care about denormalisation - it is more to make things > > simplier and less error prone... > > > > So every "thing" (Customer, product, transaction, employee, whatever) is > > built up from more tables... I am not sure we have even one "thing" > built up > > just from 1 table... > > > > Lot of "thing properties" are actually calculations: i.e. Invoice > Amount, or > > for Employee: > > We have in the one table: first_name and last_name fields, but Full name > is > > concatented as: > > > > last_name || ',' || first_name > > > > So, whenever we need employee full name - is it on Employee Info window > - or > > in an Invoice as Salesperson, or in Payslip report...instead of to > > everywhere have above formula - we have function... > > but again, instead of to developer think about each possible function for > > concrete thing - we have made for each entity - the view... what have all > > relevant joins - and all relevant calculated properties... about the > > thing... > > > > I have had a thought - somewhere read (but now I am more convenient I > have > > read it wrong) that planner is that smart that if we we have: > > > > CREATE VIEW person_view AS > > SELECT person_id, first_name, last_name, fn_full_name(id) as full_name, > > date_of_birth, age(date_of_birth) from person LEFT JOIN > > person_date_of_birth USING (person_id) > > > > SELECT first_name FROM person_view WHERE id = 1 > > > > Planner is smart and: > > -will not care about joins - you ask for the field(s) - what do not > belong > > to other tables - both belong to 1 - and it is pk! > > -will not care about functions - you havent asked for any field what is > > function in your query... > > > > However - how we met more and more problems with our approach... and > > spending more time on > > EXPLAIN ANALYZE - instead of on business problems... It seems things are > not > > that good... > > > > for simple questions as above - results are acceptable - even looking > into > > EXPLAIN ANALYZE i would not say it is the best possible plan... (i.e. > > planner spending time on Seq Scan on person_date_of_birth_table - and > filter > > it - even no need to think about that table at all - LEFT JOIN (show me > > columns - if there is a matching row for pk column) - so could be check > via > > index -however - there is no any column from that table in the query - I > > would simple discard that table from plan.... > > > > So query > > > > SELECT id FROM view WHERE id = 5 (view is SELECT * FROM table1 LEFT JOIN > > table2) > > > > I would treat the same as: > > SELECT id FROM table1 = 5 > > > > ok in INNER JOIN it requires additional confimration - but even there is > FK > > to PK join - that confirmation is not needed iether - but in our cases > it is > > always FK to PK... > > > > However - if we need to involve more "entities"/views - from some unknown > > reason to me - postgres always picks bad plan... > > > > i.e. list of employees what work in New York > > > > we have employees_contract table: > > contract_id, person_id, department_id, > > > > a lot of others tables, but to make it shorter: > > > > > > Department_view > > > > Buidlings_view > > > > > > and now query: > > SELECT full_name FROM person_view INNER JOIN emplyee_contract USING > > (person_id) INNER JOIN department_view USING (department_id) INNER JOIN > > buildings_view USING (building_id) WHERE city_id = 'NY' > > > > > > from some unknown reason - gives bad plan - then if we "refactor" query > and > > send different question - we get good result... I am pretty sure planner > > should be capable to "rephrase" my question instead of me... > > > > I would like to hear your expirience with VIEWS in postgres... And some > kind > > of best practice/advice for described situation... So far it looks to me > > there is no way - to make things ready for any specific question - every > > "request" will need specific SQL syntax to drive planner in acceptable > > direction... > > > You asked some broad questions so you are going to get broad answers. > > *) query planner is very complicated and changes are very incremental. > only a very, very small number of people (Tom especially) are capable > of making major changes to it. some known planner issues that might > get fixed in the short term are better handling of quals through UNION > ALL and/or pushing quals through partitioned windows functions. these > are documented shortcomings -- other improvements have to be looked > through the lens of 'what else did you break', including, > unfortunately, plan time. > > *) filtering in predicate expressions (select * from foo where a || b > = 'x') is going to always suck. consider optimizing via indexes > and/or materialization > > *) excessive use of views is for data abstraction (a technique I > advocate) is on one side of a tradeoff between materlization and run > time performance. if you are exceeding your ability to quickly write > clean queries, perhaps it's time to start thinking about some > materialization steps. especially you should be asking yourself if > users are needing 100% up to date data (if not, materialization > requirements relax significantly) > > *) better, more experienced programmers write faster, cleaner code. > this is true in all backends. @ two years in, you have some learning > to do that -hackers will not be able to solve for you. > > merlin >