Peter Rabbitson wrote:
> Hello everyone,
> I want someone to share his expertise on the following:
> Suppose we have a table with some data that periodically must be synced 
> with an external source, which provides only partial information (e.g. 
> it might provide all the fields for a new record, but might provide 
> only a single field difference for an already existing record). This 
> obviously will involve a series of UPDATE and INSERT statements, mostly 
> in random order. As most RDBMS can hold only one prepared statement at a 
> time (including MySQL) I have 2 ways to do this that look somewhat 
> efficient:
>

What version of MySQL? Generally this would be handled with a
transaction, but only newer MySQLs support them. And I suspect Randal
(if he is listening) will pop in with a comment about using PostgreSQL
right about now :-). And now, having used both I would agree with him.

> 1. Sweep the pre-existing table into a hash, DELETE all the records from 
> it and INSERT the new data, using the hash as a source for missing 
> fields.
> 

Yeh this sounds like a nightmare.

> 2. Create 3 connections to the database, have the first one hold a 
> SELECT statement that will replace the hash above, have the second 
> connection be an INSERT for nonexisting records and the third connection 
> be an UPDATE for existing ones
> 

As the other posters said, and to my knowledge, you should double check
that you can't prepare multiple statements.  From the DBI docs:

http://search.cpan.org/~timb/DBI-1.48/DBI.pm#prepare

"Portable applications should not assume that a new statement can be
prepared and/or executed while still fetching results from a previous
statement."

That is a "should not assume" rather than a can't do it, and that is
specifically geared towards "portable applications" which generally
means when dealing with multiple different db backends. You might also
want to have a read through the 'prepare_cached' section following the
above. It appears to be a way in DBI to handle the same if the backend
doesn't support it.

>>From reading on DBI I understood that moving lots of data with do()
> statements is pretty slow (e.g. constantly re-preparing the same 
> statement), so I think scenario 2 with a single connection is even 
> worse.
> 

Slow is still only meaningful in your context until you benchmark
whether it really is too slow you won't know.

> The reason I am elaborating on this is that I am afraid the dataset will 
> stop fitting in memory at some point of time if I go with scenario 1 
> (which is easier and cleaner to implement). Also I have no idea how 
> resource intensive scenario 2 would be, although I have proper 
> indexing and stuff so the UPDATES should be fast enough...
>

Easier I would agree with, cleaner I definitely wouldn't, at least not
if we are talking about enterprise level stuff. There are all kinds of
issues you can run into when trying to do a DELETE/INSERT instead of the
more appropriate UPDATE, especially when there is an intermediary (your
Perl hash) involved.

> Thanks
> 
> Peter
> 

HTH some,

http://danconia.org

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to