Re: [HACKERS] parallel pg_restore

Andrew Dunstan Wed, 24 Sep 2008 08:49:20 -0700


Dimitri Fontaine wrote:

Hi,

Le mardi 23 septembre 2008, Andrew Dunstan a écrit :
In any case, my agenda goes something like this:

    * get it working with a basic selection algorithm on Unix (nearly
      done - keep your eyes open for a patch soon)
    * start testing
    * get it working on Windows
    * improve the selection algorithm
    * harden code
I'm not sure whether your work will feature single table restore splitting,but if it's the case, you could consider having a look at what I've done inpgloader. The parallel loading work there was asked for by Simon Riggs andGreg Smith and you could test two different parallel algorithms.The aim was to have a "simple" testbed allowing PostgreSQL hackers to choosewhat to implement in pg_restore, so I still hope it'll get usefull someday :)

No. The proposal will perform exactly the same set of steps assingle-threaded pg_restore, but in parallel. The individual steps won'tbe broken up.

Quite apart from anything else, parallel data loading of individualtables will defeat clustering, as well as making it impossible to avoidWAL logging of the load (which I have made provision for).

The fact that custom archives are compressed by default would in factmake parallel loading of individual tables' data difficult with thepresent format. We'd have to do something like expanding it on theclient (which might not even have enough disk space) and then split itbefore loading it to the server. That's pretty yucky. Alternatively,each loader thread would need to start decompressing the data from thestart and thow away data until it got to the point it wanted to startrestoring from. Also pretty yucky.

Far better would be to provide for multiple data members in the archiveand teach pg_dump to split large tables as it writes the archive. Thenpg_restore would need comparatively little adjustment.

Also, of course, you can split tables yourself by partitioning them.That would buy you parallel data load with what I am doing now, with noextra work.

In any case, data loading is very far from being the only problem. Oneof my clients has long running restores where the data load takes about20% or so of the time - the rest is in index creation and the like. Noamount of table splitting will make a huge difference to them, butparallel processing will. As against that, if your problem is in loadingone huge table, this won't help you much. However, this is not a patternI see much - most of my clients seem to have several large tables plus aboatload of indexes. They will benefit a lot.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore

Reply via email to