Alex J. Avriette wrote:

And if you are concerned about the query's cost vs its actual output
(you weren't clear in your original message), there is the 'iscachable'
pragma for functions.

I was concerned that my function was being called 4 times for each row of the result. The cost of the function call is quite high, so the cost of the query was 4 times what I expected.

So my concern was the query's cost.

    SELECT query.i,
           query.squared AS test1,
           query.squared + 1 AS test2,
           query.squared + 2 AS test3,
           query.squared + 3 AS test4
    FROM (
        SELECT i,
               square_it(i) AS squared
        FROM foo
        OFFSET 0
    ) query;

As Tom explained to me, the optimizer flattens the subselect, so it then becomes:

    SELECT i,
           square_it(i) AS test1,
           square_it(i) + 1 AS test2,
           square_it(i) + 2 AS test3,
           square_it(i) + 3 AS test4
    FROM   foo;

Thus, each reference to query.squared in the outer query is replaced with a call to the function.

In this trivial example, that's not really a performance issue. But my real-world function calculates about 8 values and returns them as a record, which I then pick apart in the outer query. When the optimizer flattens the subquery and each reference to the function result is replaced with a call to the function, it makes for a very expensive query.

    CREATE TYPE patient_balances_type AS (patient_id INTEGER,
                                          account_id INTEGER,
                                          date DATE,
                                          due_now INTEGER,
                                          future_due INTEGER,
                                          copay_balance INTEGER,
                                          expected_insurance INTEGER,
                                          total_balance INTEGER,
                                          contract_amount INTEGER,
                                          real_due_now INTEGER,
                                          real_future_due INTEGER);

    CREATE OR REPLACE FUNCTION patient_balances(INTEGER, INTEGER, DATE)
    RETURNS patient_balances_type AS '...' LANGUAGE 'plpgsql';

My actual query looked something like:

        SELECT  query.appointment_id AS appointment_id,
                query.date,
                query.start_time,
                query.duration,
                query.patient_id,
                (query.bal).total_balance,
                (query.bal).expected_insurance,
                (query.bal).future_due,
                (query.bal).due_now
        FROM
        (
            SELECT appt.appointment_id,
                   appt.start_time,
                   appt.duration,
                   appt.date,
                   p.patient_id,
                   p.account_id,
                   patient_balances(p.patient_id,
                                    p.account_id,
                                    d.system_date) AS bal
            FROM
                   patients AS p
                   JOIN appointments appt ON
                       (p.patient_id = appt.patient_id)
                   JOIN system_date d ON
                       (appt.date = d.system_date)
        ) query;

When that subquery was flattened, each reference to query.bal was replaced with a call to patient_balances(). Which was pretty expensive.

Tom's suggestion off using OFFSET 0 to cause the optimizer not to flatten the subquery is exactly what I was looking for.

Is there a better way to be picking apart the result of my function than putting it in a subquery, since the subquery will be flattened by default? There are a bunch of places I've done this that I need to go back to now and add an OFFSET 0 because I didn't realize the optimizer would negate my cleverly crafted record-dissecting subselect. :)

Or is there a way to advise the optimizer of the cost of my function, so that it will choose to not flatten the subquery (since the total cost of doing that will be higher)? Or should the optimizer be assuming that function calls are fairly expensive by default, and not flattening subqueries that have function calls? So it's not a bug, but maybe a feature request? :)

Thanks!

- Chris


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to