Comments embedded. See below Hal Vaughan <[EMAIL PROTECTED]> wrote on 11/01/2005 02:50:13 PM:
> I want to be sure I understand "INSERT IGNORE..." correctly before I start > depending on it. Up until now, I have not been using any kind of key or > unique index, since many of my tables are created automatically and, until > now, it has been difficult for me to create a way to distinguish between the > tables that would have multiple matching records and the tables that need to > have only unique values. ... Which tables can have duplicate records in them should be something decided BEFORE you begin to populate the tables. 99.999% of the time, each row of any one table should be different from every other row on the same table. That difference may exist as a combination of values or as a single value but it is NEARLY ALWAYS in your best interest to have no two rows identical. It's not too late to make this decision. However, by putting it off for so long, implementing your uniqueness rules may be much more difficult. >... So now that I can start using keys, I have a few > questions: > Keys make finding records much faster. You should probably have created a few long before now. > 1) I've been using "SELECT FirstName, LastName, Birthdate FROM > division.People > WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = '$bday'" > to check for preexisting records. This means before I inserted a record, I > would select on specific fields and see if they matched the fields of the > current record. If they did, I threw out the record I was going to enter, if > there was not a match, I'd INSERT the new record. If I use "INSERT > IGNORE...", MySQL will still have to verify that the new record does not > match any old records. How much faster is it to do it that way than the way > I was? I'd think the same routines to find matching data would be used. No, the same routine will not be used. A hash of the values of the columns that participate in each PRIMARY KEY or UNIQUE KEY will be computed for each row (and stored as part of the key's index structure) and also for each new row as it is being INSERTed. If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an error will be thrown by the server. An INSERT with the IGNORE modifier will ignore that particular error (basically skipping that row) and continue inserting records. > > 2) Right now I'm creating an archival database to store older records in. In > one of these archives, there are over 250,000 records. At this point, by > doing things the "old way" (checking for a match, then inserting), it is now > able to insert about 750 records in 10 minutes. Earlier, when it was only > about 180,000 records, it was inserting at about 1,000 records in 10minutes. > So, first, how much of a speed up can I count on if I use "INSERT IGNORE" > instead,... none. The speed-up will come from the combination of INSERT IGNORE and the UNIQUE or PRIMARY key(s) defined on the target table. It will not come from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys on that table will cut your processing time considerably. You should be able to insert several hundred records every second (1000s per minute). Your total lack of all indexes has absolutely killed your performance. >... and second, if I use "INSERT IGNORE DELAYED", will the program > finish faster, leaving MySQL to catch up with the queued INSERT statements as > it can? The DELAYED modifier asks the server to buffer your INSERTs so that they can be interleaved with any active read requests allowing your client to believe it has finished inserting records much sooner than if it had waited on all of those inserts to actually happen. It should not be necessary to use DELAYED under most circumstances. You will need to benchmark both techniques to determine which one works best for your situation. > > 3) While this has been stated on the mysql.com, it is not what I was > originally taught by a friend and some books, so I want to be clear I > understand this correctly before I depend on it. As I understand it, "INSERT > IGNORE..." compares the data being inserted with the keys of all records in > the table and will not insert it if it duplicates an existing multi-column > key. ... Then you understand incorrectly. Only two types of keys have any affect on INSERT IGNORE: UNIQUE and PRIMARY. Regular indexes have no effect on uniqueness. If you have a PRIMARY KEY or a UNIQUE INDEX (KEY is interchangeable with INDEX) defined on a table, then all rows within that table must be different from all other rows in the same table for those columns or combination of columns you used to define the key. UNIQUE and PRIMARY KEYs are just like regular indexes except they have a little extra muscle: they define for the database what to look for in order to reject duplicate records (that is known as a "uniqueness constraint"). Normal (regular) indexes (keys) speed up queries. UNIQUE and PRIMARY KEYs do that, too, but also protect your data from duplication. >... The IGNORE basically tells MySQL to not generate an error message if the > data I'm inserting is a duplicate, so I can use "INSERT IGNORE..." in a Perl > program to be sure I'm not duplicating records and not getting errormessages > on it if the data is a duplicate. Is this correct? No, the INSERT IGNORE statement does not enforce the uniqueness of a new record, your UNIQUE and PRIMARY KEYs do that. Using the IGNORE modifier tells the server to simply ignore the error raised by violating a uniqueness constraint. > > Thanks! > > Hal > For additional information: http://dev.mysql.com/doc/refman/4.1/en/create-table.html http://dev.mysql.com/doc/refman/4.1/en/insert.html Shawn Green Database Administrator Unimin Corporation - Spruce Pine