In message <i96squ$aq...@localhost.localdomain>, Martin Gregorie wrote:
> On Thu, 14 Oct 2010 16:36:34 +1300, Lawrence D'Oliveiro wrote: > >> In message <4cb5e659$0$1650$742ec...@news.sonic.net>, John Nagle wrote: >> >>> Also note that there are some issues with doing a huge volume of >>> updates in one MySQL InnoDB transaction. The system has to keep the >>> data needed to undo the updates, and there's a limit on the amount of >>> pending transaction history that can be stored. >> >> How does “load data” avoid this? Is that not a transaction too? >> > Not usually. Its faster because there's no journalling overhead. So what happens if there’s an error part way? Do you end up with some records inserted and others not? > The loader takes out an exclusive table lock, dumps the data into the > table, rebuilds indexes and releases the lock. That’s not acceptable in a situation where other processes are concurrently making queries on the table. >> Seems to me this isn’t going to help, since both old and new tables are >> on the same disk, after all. And it’s the disk access that’s the >> bottleneck. >> > There's a lot of overhead in journalling - much more than in applying > changes to a table. The before and after images *must* be flushed to disk > on commit. In UNIX terms fsync() must be called on the journal file(s) > and this is an expensive operation on all OSes because committing a > series of small transactions can cause the same disk block to be written > several times. However, the table pages can safely be left in the DBMS > cache and flushed as part of normal cache operation since, after a crash, > the table changes can always be recovered from a journal roll-forward. A > good DBMS will do that automatically when its restarted. And how does this address the point of whether to load the new records into a new table or the same table? -- http://mail.python.org/mailman/listinfo/python-list