Hi Mike,

Those tables aren't that big for what you're doing (which is about how
I'd do it if I wasn't using built-in full-text :-)).

How many results are your searches returning? How long are the queries
taking?

The C table: Do you need the index on content_id? For deletes or
something? Doesn't really matter for speed, but you might as well dump
it if it's not used. What's the PRIMARY id there for? Just to have an id
column? :-) It looks like you can get rid of it. I'd make a composite
PRIMARY KEY on (stem_word_id, content_id). Otherwise make it UNIQUE if
you *need* the PRIMARY id. This will make EXPLAIN say "Using index" on C
for searches which will save a lot of random disk seeks to the data
file.

Can I ask what the problems are with MySQL's built-in full-text search?
I know there's a few since I've encountered them too, but I have some
ideas to work around them. Unfortunately, one that would be hard to work
around is stemming (waiting for that to be implemented internally).
-( Or are you just doing stemming to save space in the index and not for
functionality?


Hope that helps.


Matt


----- Original Message -----
From: "Mike Boone"
Sent: Wednesday, November 19, 2003 12:08 PM
Subject: Optimizing Custom Full Text Index


> Hi all,
>
> I have used PHP and MySQL for years but have never tackled the amount
of
> data that I now have (and it's expected to grow considerably). My
queries
> run OK when the server is not busy but they just about kill the DB
when
> traffic picks up.
>
> My apologies if this is common knowledge...I've had trouble searching
on
> custom full text indexing because it generally brings up hits
regarding the
> built-in full text indexing for various DB servers. MySQL's built-in
> fulltext doesn't quite do what we want.
>
> We are currently running MySQL 4.0.16 compiled with LinuxThreads on
FreeBSD.
>
>
> Basically, I'm trying to optimize a search involving three tables.
>
> Table A: (content table...currently nearly 40,000 rows and 62 MB)
>  id UNSIGNED INT PRIMARY
>  status VARCHAR 10 INDEXED
>  category VARCHAR 20 INDEXED
>  content LONGTEXT
>  + other fields
>
> Table B: (stem word index...instead of indexing the exact word, I just
keep
> the stem, so 'car' and 'cars' are both stored as 'car'. Table
currently has
> about 180,000 rows and is 9 MB)
>  id UNSIGNED INT PRIMARY
>  stem_word VARCHAR 30 INDEXED
>
> Table C: (full text index...currently about 4.5 million rows and 186
MB)
>  id UNSIGNED INT PRIMARY
>  stem_word_id (references id in table B) UNSIGNED INT INDEXED
>  content_id (references id in table A) UNSIGNED INT INDEXED
>
>
> Here's how I perform the search right now.
>
> The user enters keywords. I turn those words into a list of unique
stems. I
> then search for the stem IDs from Table B using the following query:
>
> SELECT id FROM B WHERE stem_word IN ('truck','piano','move');
>
> Using the IDs from that query (say 10, 20, 30), I run the following
query:
>
> SELECT C.content_id, COUNT(C.content_id), A.category FROM A, C WHERE
> C.content_id=A.id AND A.status='Active' AND (C.stem_word_id IN
(10,20,30))
> GROUP BY C.content_id HAVING Count(C.content_id)=3;
>
> I have recently also tried this query, which is a little cleaner
without the
> count/having stuff, but it seems about the same speed-wise:
>
> SELECT DISTINCT C0.content_id, A.category FROM A, C C0, C C1, C C2
WHERE
> C0.content_id=A.id AND A.status='Active' AND C0.stem_word_id=10 AND
> C1.stem_word_id=20 AND C0.content_id=C1.content_id AND C2.stem_word=30
AND
> C0.content_id=C1.content_id;
>
> When running the EXPLAIN on both queries, both are doing 'using where;
using
> temporary; using filesort' on table C. I'm not sure how to avoid that.
>
> This system has 512MB and I'm basically using the my-large.cnf file
as-is.
> Running mytop shows that the key efficiency is 100%, and (cache?)
"ratio"
> around 36%. All my tables are MyISAM right now. I tried switching to
InnoDB
> but it was much slower and I figured there were enough variables to
> troubleshoot already without playing around with the InnoDB
parameters.
>
> So my questions:
>
> 1. Is there anything blatantly wrong with my queries?
> 2. Should I have designed my index table differently?
> 3. Any parameter in my.cnf I should modify to be different from the
> my-large.cnf settings?
> 4. Any web resources with instructions for building customized full
text
> indexing...not using built-in stuff?
> 5. Since the content field of table A is only used for display (since
the
> words have been indexed), I was considering compressing the text in
that
> field so save DB disk space. Is that worth the effort?
>
> Any input is appreciated. Thanks for your help.
>
> Mike Boone
> (reply to the list or contact me directly at:
> http://boonedocks.net/mailmike.php3)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to