Follow up at bottom:

On Tuesday 01 November 2005 04:15 pm, Hal Vaughan wrote:
> On Tuesday 01 November 2005 03:35 pm, [EMAIL PROTECTED] wrote:
> > Which tables can have duplicate records in them should be something
> > decided BEFORE you begin to populate the tables. 99.999% of the time,
> > each row of any one table should be different from every other row on the
> > same table. That difference may exist as a combination of values or as a
> > single value but it is NEARLY ALWAYS in your best interest to have no two
> > rows identical.  It's not too late to make this decision. However, by
> > putting it off for so long, implementing your uniqueness rules may be
> > much more difficult.
>
> That's a problem with being self-taught -- there are many things like that
> one can miss.  I was not aware any indexing sped things up.  As for the
> design -- I always knew which tables required unique values, but it was a
> matter of what I had time to do and when I could do it.  The priority was
> to get the system working, and make sure all the smaller programs did their
> job and played nicely together.  Now that everything works, I can spare
> time to write code that will go through and easily distinguish which
> tables, when they are created, will need indexing.
>
> > >... So now that I can start using keys, I have a few
> > > questions:
> >
> > Keys make finding records much faster. You should probably have created a
> > few long before now.
> >
> > > 1) I've been using "SELECT FirstName, LastName, Birthdate FROM
> > > division.People
> > > WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate =
> >
> > '$bday'"
> >
> > > to check for preexisting records.  This means before I inserted a
> >
> > record, I
> >
> > > would select on specific fields and see if they matched the fields of
> >
> > the
> >
> > > current record.  If they did, I threw out the record I was going to
> >
> > enter, if
> >
> > > there was not a match, I'd INSERT the new record.  If I use "INSERT
> > > IGNORE...", MySQL will still have to verify that the new record does
> > > not
> > >
> > > match any old records.  How much faster is it  to do it that way than
> >
> > the way
> >
> > > I was?  I'd think the same routines to find matching data would be
> > > used.
> >
> > No, the same routine will not be used.
> >
> > A hash of the values of the columns that participate in each PRIMARY KEY
> > or UNIQUE KEY will be computed for each row (and stored as part of the
> > key's index structure) and also for each new row as it is being INSERTed.
> > If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an
> > error will be thrown by the server. An INSERT with the IGNORE modifier
> > will ignore that particular error (basically skipping that row) and
> > continue inserting records.
>
> Okay -- great.  That also answers my last question -- I needed to be sure
> IGNORE did what I thought it did.
>
> > > 2) Right now I'm creating an archival database to store older records
> >
> > in.  In
> >
> > > one of these archives, there are over 250,000 records.   At this point,
> >
> > by
> >
> > > doing things the "old way" (checking for a match, then inserting), it
> > > is
> >
> > now
> >
> > > able to insert about 750 records in 10 minutes.  Earlier, when it was
> >
> > only
> >
> > > about 180,000 records, it was inserting at about 1,000 records in
> >
> > 10minutes.
> >
> > > So, first, how much of a speed up can I count on if I use "INSERT
> >
> > IGNORE"
> >
> > > instead,...
> >
> > none. The speed-up will come from the combination of INSERT IGNORE and
> > the UNIQUE or PRIMARY key(s) defined on the target table. It will not
> > come from simply changing INSERT to INSERT IGNORE. In fact, defining ANY
> > keys on that table will cut your processing time considerably. You should
> > be able to insert several hundred records every second (1000s per
> > minute). Your total lack of all indexes has absolutely killed your
> > performance.
>
> So, unless I misunderstand, adding both keys and IGNORE will speed things
> up by a factor of 100 to 1000 or more.  That is a huge relief.  (Again, the
> first step was developing the system and making sure it works, so now I'm
> speeding it up and adding other useful features, like keys.
>
> > >... and second, if I use "INSERT IGNORE DELAYED", will the program
> > > finish faster, leaving MySQL to catch up with the queued INSERT
> >
> > statements as
> >
> > > it can?
> >
> > The DELAYED modifier asks the server to buffer your INSERTs so that they
> > can be interleaved with any active read requests allowing your client to
> > believe it has finished inserting records much sooner than if it had
> > waited on all of those inserts to actually happen. It should not be
> > necessary to use DELAYED under most circumstances. You will need to
> > benchmark both techniques to determine which one works best for your
> > situation.
>
> Which leads to another question: If my program things the data is inserted,
> and it is delayed, is the queue of DELAYed transactions kept anywhere -- so
> if MySQL or the system crashes, none of the DELAYed transactions are lost?
>
> > > 3) While this has been stated on the mysql.com, it is not what I was
> > > originally taught by a friend and some books, so I want to be clear I
> > > understand this correctly before I depend on it.  As I understand it,
> >
> > "INSERT
> >
> > > IGNORE..." compares the data being inserted with the keys of all
> > > records
> >
> > in
> >
> > > the table and will not insert it if it duplicates an existing
> >
> > multi-column
> >
> > > key. ...
> >
> > Then you understand incorrectly. Only two types of keys have any affect
> > on INSERT IGNORE: UNIQUE and PRIMARY. Regular indexes have no effect on
> > uniqueness.
> >
> > If you have a PRIMARY KEY or a UNIQUE INDEX (KEY is interchangeable with
> > INDEX) defined on a table, then all rows within that table must be
> > different from all other rows in the same table for those columns or
> > combination of columns you used to define the key. UNIQUE and PRIMARY
> > KEYs are just like regular indexes except they have a little extra
> > muscle: they define for the database what to look for in order to reject
> > duplicate records (that is known as a "uniqueness constraint"). Normal
> > (regular) indexes (keys) speed up queries. UNIQUE and PRIMARY KEYs do
> > that, too, but also protect your data from duplication.
> >
> > >...  The IGNORE basically tells MySQL to not generate an error message
> > > if
> >
> > the
> >
> > > data I'm inserting is a duplicate, so I can use "INSERT IGNORE..." in a
> >
> > Perl
> >
> > > program to be sure I'm not duplicating records and not getting
> >
> > errormessages
> >
> > > on it if the data is a duplicate.  Is this correct?
> >
> > No, the INSERT IGNORE statement does not enforce the uniqueness of a new
> > record, your UNIQUE and PRIMARY KEYs do that. Using the IGNORE modifier
> > tells the server to simply ignore the error raised by violating a
> > uniqueness constraint.
>
> I worded it wrong, so I was not clear about what I was thinking (coding
> 10-18 (and then debugging) hours a day since last February is catching up
> with me. You did, though, answer the question and give me what I needed --
> in spite of my poorly worded question
>
> So it sounds like adding the keys and using IGNORE on my INSERTs will make
> a huge difference -- as well as simplifying my code.
>
> Thank you!  This helps a LOT!
>
> Hal

It only took an hour or two to add indexes to the tables.  Now I'm going 
through and modifying my code to recognize tables which can be indexed (which 
will take a while).  The one database with 283,903 possible records (that 
list included dupes to be screened out when data was inserted) had, under my 
old code, taken about 40 hours to run.  When I removed my code to select the 
fields and compare and instead used a 3 column key for the same fields, 
dropped the compare routine, and used INSERT IGNORE... instead, it made a big 
difference.  The 40 hour (approx -- not sure of exact time) task ran in just 
over 11 minutes.  Just for fun, I punched in into a calculator and it turns 
out it ran approximately 210 times faster with keys and INSERT IGNORE as 
compared to my code selecting the fields that needed to be unique and 
comparing them to my new data.  I had to repopulate 6 tables.  On the long 
table, on the first run, toward the end it was processing 1-2 queries a 
second.  With keys, the (almost the) same code processed anywhere from 318 to 
679 records a second.

Thanks!

Hal

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to