Hi,

Here is the SQL query I have used in a table with less than 200.000 records,
and I have searched for a single word named "deputat" which was found in
2208 records, but only 20 of them were returned.

Here is the table:

CREATE TABLE `articles` (
  `id_newspapers` smallint(3) unsigned NOT NULL default '0',
  `id_sections` smallint(3) unsigned NOT NULL default '0',
  `id` int(6) unsigned NOT NULL auto_increment,
  `hash` varchar(16) NOT NULL default '',
  `url` varchar(255) NOT NULL default '',
  `full_url` varchar(255) NOT NULL default '',
  `pre_title` varchar(255) NOT NULL default '',
  `title` varchar(255) NOT NULL default '',
  `post_title` varchar(255) NOT NULL default '',
  `body` text NOT NULL,
  `body_hash` varchar(32) NOT NULL default '',
  `article_ident` varchar(255) NOT NULL default '',
  `date` date NOT NULL default '0000-00-00',
  `time` time NOT NULL default '00:00:00',
  `id_categories` tinyint(2) unsigned NOT NULL default '0',
  `active_view` tinyint(1) unsigned NOT NULL default '1',
  `size` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id_newspapers`,`id_sections`,`id`),
  UNIQUE KEY `hash` (`hash`),
  UNIQUE KEY `id_newspapers` (`id_newspapers`,`body_hash`),
  KEY `date` (`date`),
  KEY `id_categories` (`id_categories`),
  KEY `size` (`size`),
  KEY `title` (`title`),
  KEY `time` (`time`),
  FULLTEXT KEY `full` (`pre_title`,`title`,`post_title`,`body`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

And this is the SQL query:

select sql_calc_found_rows hash, title, date, substr(body, 1, 250) as
preview, match(pre_title, title, post_title, body) against('deputat') as
rank from articles where match(pre_title, title, post_title, body)
against('deputat' in boolean mode) order by date desc, rank desc limit 0,20;

This query took more than 12 seconds the first time when I have used it.
After using it a second time, it takes much less because it was cached.
I have used 2 times the match() function, because I want to be able to
search in boolean mode (in order to be able to use +, -, and * operators)
but I have seen that the queries are same as slow if I use it only once, so
this is not a problem.

I have tried to "explain" this query, and here is the result:

explain select sql_calc_found_rows hash, title, date, substr(body, 1, 250)
as preview, match(pre_title, title, post_title, body) against('deputat') as
rank from articles where match(pre_title, title, post_title, body)
against('deputat' in boolean mode) order by date desc, rank
 desc limit 0,20\G

           id: 1
  select_type: SIMPLE
        table: articles
         type: fulltext
possible_keys: full
          key: full
      key_len: 0
          ref:
         rows: 1
        Extra: Using where; Using filesort

So, what do you think, for such a query, can I do something to make MySQL
work faster, or do you think that there are other ways to do it faster in
perl?
(I am thinking that maybe there are some CPAN modules that can create a
fulltext index and search faster with it... creating my own personalised
database maybe...)

If these queries work so slow, I am wondering how slow it will work when the
database will have a million records.

Thank you.

Teddy

----- Original Message ----- 
From: "JupiterHost.Net" <[EMAIL PROTECTED]>
To: <beginners@perl.org>
Sent: Thursday, June 01, 2006 4:37 PM
Subject: Re: faster search engine for fulltext search


> >>Perhaps doing your queries to return only some instead of all records
> >>will help, a great module for doing this is:
> >>
> >>http://search.cpan.org/perldoc?Data::Paginate
> >>
> >
> >
> > I am using a limit clause and the sql query returns at most 20 records,
but
> > when very many records are found, the search is very slow, even if it
> > returns only 20.
> >
> > I know how many of them are found, because I use found_rows() in order
to
>
> perldoc -f found_rows
> perldoc DBI
> perldoc DBD::mysql
>
> have no refernce to that function?
>
> Care to reference found_rows()'s docs?
>
> > show how many records would be found if no limit clause would be used.
>
> Depending on the DB it may or may not matter "speed"-wise(MySQl's LIMIT
> is much more efficient than Postgre's for example) the point for the OP
> was this:
>
>   pagination will help your script be faster in the sense that the array
> of records its working with is 20 instead of 20,000,000 (IE independant
> of how efficient the DB is with a proper query, which is not a perl
> issue in the least)
>
> it also makes it easier to use for your end user, but thats a UI issue.
>
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to