I am adding an example that is runnable in order to demonstrate my point.

CREATE OR REPLACE FUNCTION initialVersion()
RETURNS VOID AS $$
declare
  testDate DATE;
begin
  for i in 1..999999 loop
    begin
    select now() into strict testDate where 1=0;
    exception when others
    then null;
    end;
  end loop;
end;
$$ Language plpgsql;

CREATE OR REPLACE FUNCTION secondVersion()
RETURNS VOID AS $$
declare
    testDate DATE;
begin
  for i in 1..999999 loop
    select now() into testDate where 1=0;
    if testDate is null then
      continue;
    end if;
  end loop;
end;
$$ Language plpgsql;

select initialVersion(); -- 19.7 seconds

select secondVersion(); -- 5.2


As you can see there is a difference of almost 15 seconds.
In the example that I have provided initially, the difference is bigger because 
the SELECT FROM t1 runs against complex data and takes more time to execute 
that the simple SELECT provided in this second example.

Thanks,
Denisa


From: Denisa Cirstescu
Sent: Monday, September 25, 2017 7:12 PM
To: pgsql-general@postgresql.org
Subject: Catching errors inside a LOOP is causing performance issues

Hi all,

I had a function with a performance issue:

totalCharge := 0;
FOR myRecord IN ... LOOP
               ......
               IF severalConditionsAreMet THEN
                              BEGIN

                                             SELECT t1.charge INTO STRICT 
recordCharge
                                             FROM t1
                                             WHERE t1.id = myRecord.id AND 
otherComplexConditionsHere;

                                             totalCharge := totalCharge + 
recordCharge;

                                             ...........
                              EXCEPTION
                                             WHEN OTHERS THEN
                                                            NULL;
                              END;
               END IF;

END LOOP;

The function was being called 232 times (not counting the number of times the 
code from the FOR was accessed).
The IF from the FOR LOOP ended up being accessed 4466 times and was taking 561 
seconds to complete all 4466 iterations.

For the particular data set that I had, the IF was always accessed, the SELECT 
from above never return data and the code was reaching the EXCEPTION branch 
each and every time.
I have changed the code to:

totalCharge := 0;
FOR myRecord IN ... LOOP
               ......
               IF severalConditionsAreMet THEN

                              SELECT t1.charge INTO recordCharge
                              FROM t1
                              WHERE t1.id = myRecord.id AND 
otherComplexConditionsHere;

                              IF (recordCharge IS NULL) THEN
                                             CONTINUE;
                              END IF;

                              totalCharge := totalCharge + recordCharge;

                              ...........

               END IF;

END LOOP;

Please note that for the table t1, the t1.charge column has a NOT NULL 
condition defined on it.
This time, the code from the IF takes 1-2 seconds to complete all 4466 
iterations.

Basically, all I did was replace the

BEGIN
...
EXCEPTION
....
END;

With

               IF conditionIsNotMet THEN
                              CONTINUE;
END IF;

Can someone please explain to me why this worked?
What happened behind the scenes?
I suspect that when you catch exceptions inside of a LOOP and the code ends up 
generating an exception, Postgres can't use cached plans to optimize that code 
so it ends up planning the code at each iteration and this causes performance 
issues.
Is my assumption correct?

Thanks a lot,
Denisa Cîrstescu

Reply via email to