[GENERAL] Query Parallelizing with dblink

AI Rumman Wed, 30 Jan 2013 07:27:47 -0800

Hi All,

I made an idea to utilize multiple core for Postgresql query. I used dblink
to achieve this task.
You may find it:
http://www.rummandba.com/2013/01/postgresql-query-parallelizing-with.html



It is just a basic idea and completely usable for my data model. But I
share it with you all so that you may add some more good ideas here.

------ Added the document here
------------------------------------------------------------------

I have three tables with almost 10 millions of records in each. The tables
are:

customer

account

tickets

"customer" table holds record for all kind of customers which are related
to account or tickets.

We need to generate a report with of customer and its related accounts or
tickets.

The basic sql is like:

select *

from

(

select c.custid,

case when a.accountid is not null then

      a.accountid

     when t.ticketid is not null then

      t.ticketid

end

as relatedid

from customer as c

left join account as a  on c.custid = a.accountid and type ='Accounts'

left  join tickets as t on c.custid = t.ticketid and type ='HelpDesk'

where c.deleted = 0

) as q

where relatedid is not null


I have all the required indexes. But the query was taking too much time.

One of the bottleneck that I always feel with Postgresql is its lack of
query parallelism technique. Good news is that, the great developers are
working on it.

However, I have to improve the query performance at this moment. So I make
a plan to divide the query in two parts and then execute each part
asynchronously and then collect the result.


To achieve this, I make the function qry_parallel_fn. This function create
two separate dblink connection conn1 and conn2 and execute two different
queries in async mode.

There is a while loop which checks if both the connections have completed
the task or not. If yes, then the function return results.


CREATE OR REPLACE FUNCTION qry_parallel_fn() RETURNS SETOF RECORD AS $$

DECLARE

   v_qry1 text;

   v_qry2 text;

   cur1 cursor is

   select *

   from dblink_get_result('conn1') as t1(custid int, relatedid int);



   cur2 cursor is

   select *

   from dblink_get_result('conn2') as t1(custid int, relatedid int);



   v_closed smallint;



BEGIN



     v_qry1 := 'select custid, accountid as relatedid from customer c inner
join account a on c.custid = a.accountid where c.deleted = 0';

     RAISE NOTICE 'vqry1 = %' , v_qry1;

     v_qry2 := 'select custid, ticketid as relatedid from customer c inner
join tickets as t on c.custid = t.ticketid where c.deleted = 0';

   PERFORM dblink_connect('conn1','dbname=rumman');

   PERFORM dblink_connect('conn2','dbname=rumman');

     PERFORM dblink_send_query('conn1',v_qry1);

     PERFORM dblink_send_query('conn2',v_qry2);



     v_closed := 0;

     WHILE v_closed <> 2 loop

       if check_conn_is_busy('conn1') = 0 then

          v_closed := v_closed + 1;

       end if;

       if check_conn_is_busy('conn2') = 0 then

          v_closed := v_closed + 1;

       end if;

     END LOOP;



     FOR rec IN cur1

     LOOP

       RETURN NEXT rec;

     END LOOP;



     FOR rec IN cur2

     LOOP

       RETURN NEXT rec;

     END LOOP;



     PERFORM dblink_disconnect('conn1');

     PERFORM dblink_disconnect('conn2');



     RETURN;

END;

$$

language 'plpgsql'


--Execute

--select * from test_fn() as t1(c int, d int);

-- select count(*) from test_fn() as t1(c int, d int);


CREATE OR REPLACE FUNCTION check_conn_is_busy(conn text) RETURNS INT AS $$

DECLARE

  v int;

BEGIN

   SELECT dblink_is_busy(conn) INTO v;

   RETURN v;

END;

$$

language 'plpgsql'


I was monitoring the server performance and found that it was using two cpu
cores to get the result and improve the query timing a bit.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Let me know your thoughts on it.


Thanks.

[GENERAL] Query Parallelizing with dblink

Reply via email to