Comments embedded. See below

Hal Vaughan <[EMAIL PROTECTED]> wrote on 11/01/2005 02:50:13 PM:

> I want to be sure I understand "INSERT IGNORE..." correctly before I 
start 
> depending on it.  Up until now, I have not been using any kind of key or 

> unique index, since many of my tables are created automatically and, 
until 
> now, it has been difficult for me to create a way to distinguish between 
the 
> tables that would have multiple matching records and the tables that 
need to 
> have only unique values. ... 

Which tables can have duplicate records in them should be something 
decided BEFORE you begin to populate the tables. 99.999% of the time, each 
row of any one table should be different from every other row on the same 
table. That difference may exist as a combination of values or as a single 
value but it is NEARLY ALWAYS in your best interest to have no two rows 
identical.  It's not too late to make this decision. However, by putting 
it off for so long, implementing your uniqueness rules may be much more 
difficult.


>... So now that I can start using keys, I have a few 
> questions:
> 

Keys make finding records much faster. You should probably have created a 
few long before now.


> 1) I've been using "SELECT FirstName, LastName, Birthdate FROM 
> division.People 
> WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = 
'$bday'" 
> to check for preexisting records.  This means before I inserted a 
record, I 
> would select on specific fields and see if they matched the fields of 
the 
> current record.  If they did, I threw out the record I was going to 
enter, if 
> there was not a match, I'd INSERT the new record.  If I use "INSERT 
> IGNORE...", MySQL will still have to verify that the new record does not 

> match any old records.  How much faster is it  to do it that way than 
the way 
> I was?  I'd think the same routines to find matching data would be used.

No, the same routine will not be used. 

A hash of the values of the columns that participate in each PRIMARY KEY 
or UNIQUE KEY will be computed for each row (and stored as part of the 
key's index structure) and also for each new row as it is being INSERTed. 
If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an 
error will be thrown by the server. An INSERT with the IGNORE modifier 
will ignore that particular error (basically skipping that row) and 
continue inserting records.

> 
> 2) Right now I'm creating an archival database to store older records 
in.  In 
> one of these archives, there are over 250,000 records.   At this point, 
by 
> doing things the "old way" (checking for a match, then inserting), it is 
now 
> able to insert about 750 records in 10 minutes.  Earlier, when it was 
only 
> about 180,000 records, it was inserting at about 1,000 records in 
10minutes. 
> So, first, how much of a speed up can I count on if I use "INSERT 
IGNORE" 
> instead,... 

none. The speed-up will come from the combination of INSERT IGNORE and the 
UNIQUE or PRIMARY key(s) defined on the target table. It will not come 
from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys 
on that table will cut your processing time considerably. You should be 
able to insert several hundred records every second (1000s per minute). 
Your total lack of all indexes has absolutely killed your performance.

>... and second, if I use "INSERT IGNORE DELAYED", will the program 
> finish faster, leaving MySQL to catch up with the queued INSERT 
statements as 
> it can?

The DELAYED modifier asks the server to buffer your INSERTs so that they 
can be interleaved with any active read requests allowing your client to 
believe it has finished inserting records much sooner than if it had 
waited on all of those inserts to actually happen. It should not be 
necessary to use DELAYED under most circumstances. You will need to 
benchmark both techniques to determine which one works best for your 
situation.

> 
> 3) While this has been stated on the mysql.com, it is not what I was 
> originally taught by a friend and some books, so I want to be clear I 
> understand this correctly before I depend on it.  As I understand it, 
"INSERT 
> IGNORE..." compares the data being inserted with the keys of all records 
in 
> the table and will not insert it if it duplicates an existing 
multi-column 
> key. ... 

Then you understand incorrectly. Only two types of keys have any affect on 
INSERT IGNORE: UNIQUE and PRIMARY. Regular indexes have no effect on 
uniqueness.

If you have a PRIMARY KEY or a UNIQUE INDEX (KEY is interchangeable with 
INDEX) defined on a table, then all rows within that table must be 
different from all other rows in the same table for those columns or 
combination of columns you used to define the key. UNIQUE and PRIMARY KEYs 
are just like regular indexes except they have a little extra muscle: they 
define for the database what to look for in order to reject duplicate 
records (that is known as a "uniqueness constraint"). Normal (regular) 
indexes (keys) speed up queries. UNIQUE and PRIMARY KEYs do that, too, but 
also protect your data from duplication.

>...  The IGNORE basically tells MySQL to not generate an error message if 
the 
> data I'm inserting is a duplicate, so I can use "INSERT IGNORE..." in a 
Perl 
> program to be sure I'm not duplicating records and not getting 
errormessages 
> on it if the data is a duplicate.  Is this correct?

No, the INSERT IGNORE statement does not enforce the uniqueness of a new 
record, your UNIQUE and PRIMARY KEYs do that. Using the IGNORE modifier 
tells the server to simply ignore the error raised by violating a 
uniqueness constraint.

> 
> Thanks!
> 
> Hal
> 

For additional information:
http://dev.mysql.com/doc/refman/4.1/en/create-table.html
http://dev.mysql.com/doc/refman/4.1/en/insert.html

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to