Dimitri Fontaine wrote:
Hi,

Le mardi 23 septembre 2008, Andrew Dunstan a écrit :
In any case, my agenda goes something like this:

    * get it working with a basic selection algorithm on Unix (nearly
      done - keep your eyes open for a patch soon)
    * start testing
    * get it working on Windows
    * improve the selection algorithm
    * harden code

I'm not sure whether your work will feature single table restore splitting, but if it's the case, you could consider having a look at what I've done in pgloader. The parallel loading work there was asked for by Simon Riggs and Greg Smith and you could test two different parallel algorithms. The aim was to have a "simple" testbed allowing PostgreSQL hackers to choose what to implement in pg_restore, so I still hope it'll get usefull someday :)



No. The proposal will perform exactly the same set of steps as single-threaded pg_restore, but in parallel. The individual steps won't be broken up.

Quite apart from anything else, parallel data loading of individual tables will defeat clustering, as well as making it impossible to avoid WAL logging of the load (which I have made provision for).

The fact that custom archives are compressed by default would in fact make parallel loading of individual tables' data difficult with the present format. We'd have to do something like expanding it on the client (which might not even have enough disk space) and then split it before loading it to the server. That's pretty yucky. Alternatively, each loader thread would need to start decompressing the data from the start and thow away data until it got to the point it wanted to start restoring from. Also pretty yucky.

Far better would be to provide for multiple data members in the archive and teach pg_dump to split large tables as it writes the archive. Then pg_restore would need comparatively little adjustment.

Also, of course, you can split tables yourself by partitioning them. That would buy you parallel data load with what I am doing now, with no extra work.

In any case, data loading is very far from being the only problem. One of my clients has long running restores where the data load takes about 20% or so of the time - the rest is in index creation and the like. No amount of table splitting will make a huge difference to them, but parallel processing will. As against that, if your problem is in loading one huge table, this won't help you much. However, this is not a pattern I see much - most of my clients seem to have several large tables plus a boatload of indexes. They will benefit a lot.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to