On 17.12.2010 00:29, Andres Freund wrote:
On Thursday 16 December 2010 19:33:10 Joachim Wieland wrote:
On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com> wrote:
As soon as we have parallel pg_dump, the next big thing is going to be
parallel dump of the same table using multiple processes. Perhaps we
should prepare for that in the directory archive format, by allowing the
data of a single table to be split into multiple files. That way
parallel pg_dump is simple, you just split the table in chunks of
roughly the same size, say 10GB each, and launch a process for each
chunk, writing to a separate file.
How exactly would you "just split the table in chunks of roughly the
same size" ? Which queries should pg_dump send to the backend? If it
just sends a bunch of WHERE queries, the server would still scan the
same data several times since each pg_dump client would result in a
seqscan over the full table.
I would suggest implementing< > support for tidscans and doing it in segment
size...
I don't think there's any particular gain from matching the server's
data file segment size, although 1GB does sound like a good chunk size
for this too.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers