-----Original Message-----
From: Andreas Pflug [mailto:[EMAIL PROTECTED] Sent: Friday, August 15, 2003 11:36 AM
To: Dann Corbit
Cc: Stephan Szabo; PostgreSQL-development
Subject: Re: [HACKERS] [GENERAL] 7.4Beta
Dann Corbit wrote:
Simplification of bulk operations can be very important forcustomers
(on the other hand). For the CONNX tool set, we offer an escape on INSERT/SELECT that performs the operation in bulk mode.not being
There are serious downsides to bulk operations also (such as
logged and therefore existing outside of a transaction). Therefore, they are useful really only in cases where a completedestruction and
repopulation is called for. If anything goes haywire, youcan't simply
roll it back.loading) can
Yet the speed savings can be enormous (orders of magnitude).
Compared to iteration over a set of prepared inserts, a bulk insert (such as using Microsoft's BCP API or Oracles Direct Path
be 100 times faster. If you are moving gigabytes of data and performing a complete refresh, the method to use becomes obvious.like this:
When we go outside of the language bounds, a curly braced escape notation is used. For instance, an insert/select might look
INSERT INTO <destination_table> SELECT <column_list> FROM <source_table> {fn commitcount 1000} {bulkmode} The commitcount says
to use batches of 1000 rows and bulkmode says to use the fastest possible insert method.When saying "bulk operation" I don't necessarily mean using bulk load or stuff like that. What I mean is handling large amounts of similar data at the same time. That doesn't say anything about transactions or logging problems.
Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data source as a table. Then, with a syntax such as the above (or SELECT INTO etc.), you could very rapidly move data from one system into another.
Imagine you have 100k or rows to load, each having FKs to (hopefully) existing rows
- Step 1: load the rows into a temp table
- Step 2: identify duplicates, logging and deleting them
- Step 3: insert all data satisfying the FK constraints.
- Step 4: log all that didn't insert.
This are relatively few statements (not the simplest), which can be handled in a safe manner.
Just an example, how a RDBMS can do the job in a mass oriented (and thus optimizable) way.
I really, really like SQL*Server's solution to the problem you have stated above. A typical example is you have a pile of incoming words and you want to create a dictionary. An example might be a library database, where you have an abstract for each item. You parse the words and look for new ones to add.
What SQL*Server allows is an index with an option called "IGNORE
DUPLICATES" that simply tosses out rows that are already in the table.
For applications like what I have described and what you have described
it is an incredibly useful extension. Once I got used to it, I found
myself using it all the time. Of course, you must be very careful to
ensure that the duplicates really are completely unimportant.
Hm, option is quite a quick hack. I like logging in the database much better than some sql messages. And it's only about duplicates, not about fk constraint violations.
Regards, Andresa
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings