Peter Rabbitson wrote: > Hello everyone, > I want someone to share his expertise on the following: > Suppose we have a table with some data that periodically must be synced > with an external source, which provides only partial information (e.g. > it might provide all the fields for a new record, but might provide > only a single field difference for an already existing record). This > obviously will involve a series of UPDATE and INSERT statements, mostly > in random order. As most RDBMS can hold only one prepared statement at a > time (including MySQL) I have 2 ways to do this that look somewhat > efficient: >
What version of MySQL? Generally this would be handled with a transaction, but only newer MySQLs support them. And I suspect Randal (if he is listening) will pop in with a comment about using PostgreSQL right about now :-). And now, having used both I would agree with him. > 1. Sweep the pre-existing table into a hash, DELETE all the records from > it and INSERT the new data, using the hash as a source for missing > fields. > Yeh this sounds like a nightmare. > 2. Create 3 connections to the database, have the first one hold a > SELECT statement that will replace the hash above, have the second > connection be an INSERT for nonexisting records and the third connection > be an UPDATE for existing ones > As the other posters said, and to my knowledge, you should double check that you can't prepare multiple statements. From the DBI docs: http://search.cpan.org/~timb/DBI-1.48/DBI.pm#prepare "Portable applications should not assume that a new statement can be prepared and/or executed while still fetching results from a previous statement." That is a "should not assume" rather than a can't do it, and that is specifically geared towards "portable applications" which generally means when dealing with multiple different db backends. You might also want to have a read through the 'prepare_cached' section following the above. It appears to be a way in DBI to handle the same if the backend doesn't support it. >>From reading on DBI I understood that moving lots of data with do() > statements is pretty slow (e.g. constantly re-preparing the same > statement), so I think scenario 2 with a single connection is even > worse. > Slow is still only meaningful in your context until you benchmark whether it really is too slow you won't know. > The reason I am elaborating on this is that I am afraid the dataset will > stop fitting in memory at some point of time if I go with scenario 1 > (which is easier and cleaner to implement). Also I have no idea how > resource intensive scenario 2 would be, although I have proper > indexing and stuff so the UPDATES should be fast enough... > Easier I would agree with, cleaner I definitely wouldn't, at least not if we are talking about enterprise level stuff. There are all kinds of issues you can run into when trying to do a DELETE/INSERT instead of the more appropriate UPDATE, especially when there is an intermediary (your Perl hash) involved. > Thanks > > Peter > HTH some, http://danconia.org -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>