FULLTEXT and large database

2004-05-25 Thread James Drabb
Hey group,

I am in need of some suggestions.  I am a senior programmer for a
fortune
500 where we mostly use Oracle and MS SQL Server.  However, a project
came
along and no one wanted to spend more money on Oracle or SQL Server
licenses
so I suggested MySQL and to my surprise they let me set it up.  All of
our
Unix and Linux servers are used to the max, so I was only given a 2-way
1.26 GHz PIII, 1.2GB, 136GB SCSI RAID 5 server running Windows 20003.

The MySQL database is being used for a proprietary win/linux/unix syslog
program that was purchased.  Currently there are about 45,000 records
per
hour going in. The admin I set it up for would like to have one months
worth
of data.  How big of a DB can MySQL handle?  After two days of running
there
are 2,160,000 or so records in the database.

The table layout is:

CREATE TABLE `syslogd` (
  `MsgDate` date default NULL,
  `MsgTime` time default NULL,
  `MsgPriority` varchar(30) default NULL,
  `MsgHostname` varchar(255) default NULL,
  `MsgText` text,
  KEY `kiwi_MsgDate` (`MsgDate`),
  KEY `Kiwi_MsgPriority` (`MsgPriority`),
  KEY `kiwi_MsgTime` (`MsgTime`),
  KEY `Kiwi_MsgHostname` (`MsgHostname`),
  FULLTEXT KEY `Kiwi_MsgText` (`MsgText`)
) TYPE=MyISAM;

I have a web app to allow admins to search this table.  Any searches by
MsgDate, MsgTime, MsgPriority or MsgHostname or any combination of them
runs
pretty fast.  When I try to do a search against MsgText, it takes quite
a
while.  I originally tried INSTR(), however Explain showed that no INDEX
was
being used.  Using Match() Against() showed the Kiwi_MsgText FULLTEXT
index
getting used.

What can I do to speed this up?  Should I do a regular INDEX? I used the
configuration setting from the example huge.cnf file.

Thanks for any guidance,

Jim Drabb
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
He who receives an idea from me, receives instruction himself without
lessening mine; as he who lights his taper at mine, receives light
without
darkening me. -- Thomas Jefferson; 1813
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
James Drabb JR
Senior Programmer Analyst
Darden Restaurants
Business Systems
JDrabb at Darden dot com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: FULLTEXT and large database

2004-05-25 Thread James Drabb
-Original Message-
From: Jigal van Hemert [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 25, 2004 7:56 AM
To: [EMAIL PROTECTED]
Subject: Re: FULLTEXT and large database

> > Using Match() Against() showed the Kiwi_MsgText FULLTEXT index
getting 
> > used.
> 
> Correct. Full text indexes can only be used on MyISAM tables and will
not
> find (at least in the basic form) > > matches that are present in more
> than 50% of the records.

So is there any way to efficiently search a TEXT field of 5 million rows
for exact phrases?  In this case the MsgText field contains a
descriptive
error message that the admins want to search for to help tune the
network

> > What can I do to speed this up?  Should I do a regular INDEX? I used

> > the configuration setting from the example huge.cnf file.
> 
> Sometimes it's better to do some preprocessing while storing the data.
> If you search for several predefined > keywords it might be wise to
check
> for these keywords when storing the data and use a SET or so to store
> which keywords were present. This will reduce a text search to a bit
> compare.

Unfortunately, the syslog application is a closed sourced/proprietary
app
that the admins purchased and I do not have access to the source code to
change things.  The syslog app created the MySQL table with no indexes
so
I created the index to help in searching the data.

When I run a FULLTEXT search and view the process list of MySQL in
MySQL,
I see in the state column that MySQL is "sorting results" which appears
to
take up some time.

Here is a typical query:

Select
  DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate',
  MsgTime, MsgPriority, MsgHostname, MsgText
From
  syslogd
WHERE
  MATCH (MsgText) AGAINST ('RADIUS')
Order By
  MsgDate Desc
LIMIT 200

Here an admin wants to find logs about a RADIUS server.  The index on
MsgDate
is in ascending order, I guess I could make it descending?

> Regards, Jigal.

Thanks for any help,

Jim Drabb
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
He who receives an idea from me, receives instruction himself without
lessening mine; as he who lights his taper at mine, receives light
without
darkening me. -- Thomas Jefferson; 1813
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
James Drabb JR
Senior Programmer Analyst
Darden Restaurants
Business Systems
JDrabb at Darden dot com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: FULLTEXT and large database

2004-05-25 Thread James Drabb
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 25, 2004 8:52 AM
To: [EMAIL PROTECTED]
Cc: James Drabb; [EMAIL PROTECTED]
Subject: Re: FULLTEXT and large database


> One thing that ocurs to me is that, if you are using it for syslog
type
> data, the data being logged will have a relatively samll "vocabulary".
> No tesxt-based indexing system will handle this very well. Like
looking
> up a very common name in a telephone directory, you will get a lot of
> hits which will then have to be searched linearly for the right one.
And
> linear search of meven a subset of a buig database is bad news,
because
> the records will be physically scattered and not amenable to caching.
> 
> *If* it is true that your data is highly repetitive, could you arrange
> to factor out the repetitive bits? For example, if there ate only 20
> record types can you parse out the type and convert it to an integer?
>   Alec

The syslog app is proprietary/closed source which I cannot change.
The main column that admins are interested in is MsgText text which
has specific error/info/warning messages such as:


%PIX-4-106023: Deny udp src inside:10.1.5.242/137 dst
core:146.217.142.101/137 by access-group "inside_in"


May 25 09:59:40 trend.darden.com MSWinEventLog<009>0<009>Security<009>
14445084<009>Tue May 25 09:59:34 2004<009>593<009>Security<009>
IUSR_TREND<009>User<009>Success Audit<009>TREND<009>Detailed Tracking
<009><009>A process has exited: Process ID: 32036 Image File
Name: C:OfficescanPCCSRVWebCGIcgiRqHotFix.exe User Name: IUSR_TREND
Domain: TREND Logon ID: (XX)<009>418997

Thanks again,

Jim Drabb
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
He who receives an idea from me, receives instruction himself without
lessening mine; as he who lights his taper at mine, receives light
without
darkening me. -- Thomas Jefferson; 1813
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
James Drabb JR
Senior Programmer Analyst
Darden Restaurants
Business Systems
JDrabb at Darden dot com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: FULLTEXT and large database

2004-05-25 Thread James Drabb
-Original Message-
From: Brent Baisley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 25, 2004 9:50 AM
To: James Drabb
Cc: Jigal van Hemert; [EMAIL PROTECTED]
Subject: Re: FULLTEXT and large database

> You may try increasing your sort_buffer_size and key_buffer_size,
> see if that speeds up the searching and sorting.

I increased key_buffer_size to 768M and sort_buffer_size to 256M
and it seems to have helped.  For example this query on
3.1 million rows returned in 0.02 seconds:

Select DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate', MsgTime,
MsgPriority, MsgHostname, MsgText
>From syslogd
WHERE MATCH (MsgText) AGAINST ('LINK-CLUSTER_MEMBER_1-3-UPDOWN')
LIMIT 200


However, I have noticed something weird with a FULLTEXT search.  This
query:

Select DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate', MsgTime,
MsgPriority, MsgHostname, MsgText
>From syslogd
WHERE MsgPriority = 'Local7.Notice'
Order By MsgDate Desc LIMIT 200

Returns several rows and in the MsgText field (which has the FULLTEXT
index) for
one of the rows is this text:

12822: May 25 09:14:10.400 EST: %LINEPROTO-CLUSTER_MEMBER_1-5-UPDOWN:
Line protocol on Interface FastEthernet0/11, changed state to down

Now if I run this query to search for something in that field, I get no
records:

Select DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate', MsgTime,
MsgPriority, MsgHostname, MsgText
>From syslogd
WHERE MATCH (MsgText) AGAINST ('May')
Order By MsgDate Desc LIMIT 200

The text 'May' is in the column, yet I get no rows returned?  The admins
are looking
to be able to search for an arbitrary string in the MsgText field that
can be anywhere
in the field not just at the beginning.  So it seems that a regular
index will not work,
yet a FULLTEXT index doesn't seem to be getting all the needed data.  

Thanks for the tip,

Jim Drabb
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
He who receives an idea from me, receives instruction himself without
lessening mine; as he who lights his taper at mine, receives light
without
darkening me. -- Thomas Jefferson; 1813
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
James Drabb JR
Senior Programmer Analyst
Darden Restaurants
Business Systems
JDrabb at Darden dot com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: FULLTEXT and large database

2004-05-25 Thread James Drabb
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Harald Fuchs
Sent: Tuesday, May 25, 2004 11:18 AM
To: James Drabb
Subject: Re: FULLTEXT and large database

> > I seem to get inconsistent results when using the FULLTEXT index
> > speed-wise on 3 million+ records.  Here is a query that has been
> > in the "Sorting results" stage for 200 seconds.  There is a
> > LIMIT clause of 200, so it cannot take MySQL 200 seconds to
> > sort 200 records.

> You've got that wrong.  MySQL has to sort *the entire result set*
> (before LIMITing) in order to find the 200 most recent entries.
> 
> Thus my question how many rows the MATCH would return without the
LIMIT.

O!.  Well then it would be a few hundred thousand to possibly a few
million.

I cannot really remove any data since the admins would like 30 days
worth
And right now there is only about 3 days worth.  This project is still
in
pilot, so maybe we may need to limit to 14 days of data or so.

Jim Drabb
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
He who receives an idea from me, receives instruction himself without
lessening mine; as he who lights his taper at mine, receives light
without
darkening me. -- Thomas Jefferson; 1813
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
James Drabb JR
Senior Programmer Analyst
Darden Restaurants
Business Systems
JDrabb at Darden dot com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Unable to Programatically Create DB Connection

2004-05-25 Thread James Drabb
-Original Message-
From: Scott D. Spiegler [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 25, 2004 11:37 AM
To: [EMAIL PROTECTED]
Subject: Unable to Programatically Create DB Connection

> I am trying to programatically connect to my DB, but I am not sure
what
> the connection string should be. I am using this statement:
> 
> conn = 
>
DriverManager.getConnection("jdbc:mysql://localhost/test?user=scott&pass
word=cuatro");
> 
> I am getting this exception message:
>  
>  database_test.DBConnector
>  SQLException: No suitable driver
>  
>  SQLState: 08001
>  
>  VendorError: 0
>   
> Any idea as to what the correct, connection string  might be?
>  
> Thanks, Scott

Did you load the driver with Class.forName?

Here is an example:


import java.sql.Connection; 
import java.sql.DriverManager; 
import java.sql.SQLException; 

// Notice, do not import com.mysql.jdbc.* 
// or you will have problems! 

public class LoadDriver { 
public static void main(String[] args) { 
try { 
// The newInstance() call is a work around for some 
// broken Java implementations

Class.forName("com.mysql.jdbc.Driver").newInstance(); 
} catch (Exception ex) { 
// handle the error 
}
}

This is all in the ConnectorJ docs at:
http://dev.mysql.com/doc/connector/j/en/#id2801034

Jim Drabb
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
He who receives an idea from me, receives instruction himself without
lessening mine; as he who lights his taper at mine, receives light
without
darkening me. -- Thomas Jefferson; 1813
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-
James Drabb JR
Senior Programmer Analyst
Darden Restaurants
Business Systems
JDrabb at Darden dot com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]