Title: Modifying COPY TO

I am interested in hacking COPY TO such that one can specify that
rows are copied in a certain index order.  I got as far as
src/backend/commands/copy.c:CopyTo(), and it looks like I would need
to modify the call to heap_beginscan() so that it uses a key.  However,
I couldn't figure out how to provide one, or if I'm even looking at the
right area.  Ideally, this behavior would be specified with a flag,
perhaps: "WITH INDEX <index_name>" or "WITH PRIMARY KEY"
or something similar.

The motivation for this change is as follows.  I have a fairly large
database (10 million+ records) that mirrors the data in a proprietary
system.  The only access to that data is through exported flat files.
Currently, those flat files are copied directly into a staging area in the
db via a COPY FROM, the actual tables are truncated, and the
staging data is inserted into the live tables.  Since the data is read-only,
it doesn't matter that it is recreated every day.  However, as you
can imagine, the import process takes quite a while (several hours).
Also, rebuilding the db from scratch every day loses any statistical
information gathered from the execution of queries during the day.

A possibility that I would like to pursue is to keep the staging data
from the previous day, do a COPY TO, import the new data into
another staging table with a COPY FROM, then export the fresh
data with another COPY TO.  Then, I can write a fast C/C++
program to do a line-by-line comparison of each record, isolating
the ones that have changed from the previous day.  I can then
emit those records in a change file that should be relatively small
and easy to update.  Of course, this scheme can only work if
COPY TO emits the records in a reliable order.

Any assistance on this project would be greatly appreciated.  The
best I can see, I'm stuck on line 1053 from copy.c:

    scandesc = heap_beginscan(rel, mySnapshot, 0, NULL);

I suspect that I want it to look like this:

    scandesc = heap_beginscan(rel, mySnapshot, 1, key);

where 'key' is an appropriately constructed ScanKey.  It looks
like I want to call ScanKeyEntryInitialize(), but I'm not sure what
parameters I need to pass to it to get an index or the primary
key.  I mostly need help building the ScanKey object.  I think I
can figure out how to hack the custom option, etc.  I should
mention that I am using the 7.4.7 codebase on Linux 2.4.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


Reply via email to