Have you tried moving the "GroupID not IN ()" into your JOIN clause?
SELECT * FROM News JOIN FileGroup ON News.FileID = FileGroup.FileID AND FileGroup.GroupID NOT IN (638) WHERE News.Category = 7 GROUP BY News.FileID ORDER BY News.Date DESC LIMIT 100 I think, though, that because you are excluding only a small part of your FileGroup table, the optimizer might think that more than the "magic threshold" (somewhere around 30%) number of records will be returned and may still revert to a table scan. Another option could be to select all of the rows from FileGroup where the rows are IN the group to a temp table. Then LEFT JOIN the temp table to news and check for null values in the temporary table. CREATE TEMPORARY TABLE tmpFG SELECT FileID FROM FileGroup WHERE Group IN (638) ALTER TABLE tmpFG ADD KEY(FileID) SELECT * FROM News n JOIN FileGroup fg ON n.FileID = fg.FileID LEFT JOIN tmpFG tfg on fg.FileID = n.FileID WHERE n.Category = 7 AND tfg.fileID is null GROUP BY n.FileID ORDER BY n.Date DESC LIMIT 100 DROP TABLE tmpFG I suggest this as it is sometimes faster to find something then exclude what you find than to "not find" it the first time. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Chris Elsworth <[EMAIL PROTECTED]> wrote on 07/19/2004 11:59:43 AM: > Hello, > > I wonder if someone could shed some light on a problem that's been bothering > me for months. Please bear with me as I explain it.. > > I have two tables in question: > > CREATE TABLE `News` ( > `FileID` int(8) unsigned NOT NULL auto_increment, > `Subject` char(200) NOT NULL default '', > `Category` tinyint(3) unsigned NOT NULL default '0', > `SubCategory` smallint(5) unsigned NOT NULL default '0', > `Date` int(10) unsigned NOT NULL default '0', > PRIMARY KEY (`FileID`), > KEY `S_D` (`SubCategory`,`Date`), > KEY `C_D_P` (`Category`,`Date`,`PostID`), > KEY `C_P_D` (`Category`,`PostID`,`Date`, > KEY `Subject` (`Subject`(10)), > KEY `C_D` (`Category`,`Date`), > FULLTEXT KEY `ft_Subject` (`Subject`) > ) TYPE=MyISAM > > CREATE TABLE `FileGroup` ( > `FileID` int(8) unsigned NOT NULL default '0', > `GroupID` smallint(5) unsigned NOT NULL default '0', > PRIMARY KEY (`GroupID`,`FileID`), > KEY `F_G` (`FileID`,`GroupID`), > ) TYPE=MyISAM > > > News has about 2.5m rows and FileGroup has 3.1m rows. > For each row in News, there are one or more corresponding rows in > FileGroup, determining in which newsgroups each News article appears. > > So, typically: > mysql> select * from News where FileID = 26222004\G > *************************** 1. row *************************** > FileID: 26222004 > Subject: some_usenet_post > Category: 7 > SubCategory: 706 > Date: 1090239675 > > mysql> select * from FileGroup where FileID = 26222004\G > *************************** 1. row *************************** > FileID: 26222004 > GroupID: 638 > *************************** 2. row *************************** > FileID: 26222004 > GroupID: 799 > > > The problem occurs when I want to get News rows that do not appear > in a specific group or set of groups. I also only want one row per News > article, not one row per group, so I have a GROUP BY (I could maybe use > DISTINCT too but they'd do pretty much the same) in there. > > I end up with something like this: > > SELECT * FROM News > JOIN FileGroup ON (News.FileID = FileGroup.FileID) > WHERE GroupID NOT IN (638) AND Category = 7 > GROUP BY News.FileID ORDER BY Date DESC > LIMIT 100 > > The resulting explain: > +-----------+------+-----------------------------+-------+--------- > +-------------+--------+----------------------------------------------+ > | table | type | possible_keys | key | key_len | > ref | rows | Extra | > +-----------+------+-----------------------------+-------+--------- > +-------------+--------+----------------------------------------------+ > | News | ref | PRIMARY,C_D_P,C_P_D,C_D | C_P_D | 1 | > const | 595494 | Using where; Using temporary; Using filesort | > | FileGroup | ref | F_G | F_G | 4 | > News.FileID | 1 | Using where | > +-----------+------+-----------------------------+-------+--------- > +-------------+--------+----------------------------------------------+ > > MySQL is being forced to create a temporary table because of the GROUP BY, > and it pretty much seems to end up scanning the entire table - queries are > taking upwards of 30 seconds. In the queries that I can remove the JOIN (I > don't always need it, because I don't always need to exclude items in > certain groups) it flies, because the temporary table and filesort pretty > much always go; I can fiddle with indexes to make that always the case; the > GROUP BY can go too, since the rows will always be unique. > > I've typed myself out now so I'll keep the actual question short :) > What can I do about this? Is there a more efficient way to store this data > to avoid these horrific queries? > > If I can supply any more relevant information I'll be only too pleased to. > > Thanks for any hints in advance. > > -- > Chris > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] >