FULLTEXT and large database
Hey group, I am in need of some suggestions. I am a senior programmer for a fortune 500 where we mostly use Oracle and MS SQL Server. However, a project came along and no one wanted to spend more money on Oracle or SQL Server licenses so I suggested MySQL and to my surprise they let me set it up. All of our Unix and Linux servers are used to the max, so I was only given a 2-way 1.26 GHz PIII, 1.2GB, 136GB SCSI RAID 5 server running Windows 20003. The MySQL database is being used for a proprietary win/linux/unix syslog program that was purchased. Currently there are about 45,000 records per hour going in. The admin I set it up for would like to have one months worth of data. How big of a DB can MySQL handle? After two days of running there are 2,160,000 or so records in the database. The table layout is: CREATE TABLE `syslogd` ( `MsgDate` date default NULL, `MsgTime` time default NULL, `MsgPriority` varchar(30) default NULL, `MsgHostname` varchar(255) default NULL, `MsgText` text, KEY `kiwi_MsgDate` (`MsgDate`), KEY `Kiwi_MsgPriority` (`MsgPriority`), KEY `kiwi_MsgTime` (`MsgTime`), KEY `Kiwi_MsgHostname` (`MsgHostname`), FULLTEXT KEY `Kiwi_MsgText` (`MsgText`) ) TYPE=MyISAM; I have a web app to allow admins to search this table. Any searches by MsgDate, MsgTime, MsgPriority or MsgHostname or any combination of them runs pretty fast. When I try to do a search against MsgText, it takes quite a while. I originally tried INSTR(), however Explain showed that no INDEX was being used. Using Match() Against() showed the Kiwi_MsgText FULLTEXT index getting used. What can I do to speed this up? Should I do a regular INDEX? I used the configuration setting from the example huge.cnf file. Thanks for any guidance, Jim Drabb -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. -- Thomas Jefferson; 1813 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- James Drabb JR Senior Programmer Analyst Darden Restaurants Business Systems JDrabb at Darden dot com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: FULLTEXT and large database
-Original Message- From: Jigal van Hemert [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 25, 2004 7:56 AM To: [EMAIL PROTECTED] Subject: Re: FULLTEXT and large database > > Using Match() Against() showed the Kiwi_MsgText FULLTEXT index getting > > used. > > Correct. Full text indexes can only be used on MyISAM tables and will not > find (at least in the basic form) > > matches that are present in more > than 50% of the records. So is there any way to efficiently search a TEXT field of 5 million rows for exact phrases? In this case the MsgText field contains a descriptive error message that the admins want to search for to help tune the network > > What can I do to speed this up? Should I do a regular INDEX? I used > > the configuration setting from the example huge.cnf file. > > Sometimes it's better to do some preprocessing while storing the data. > If you search for several predefined > keywords it might be wise to check > for these keywords when storing the data and use a SET or so to store > which keywords were present. This will reduce a text search to a bit > compare. Unfortunately, the syslog application is a closed sourced/proprietary app that the admins purchased and I do not have access to the source code to change things. The syslog app created the MySQL table with no indexes so I created the index to help in searching the data. When I run a FULLTEXT search and view the process list of MySQL in MySQL, I see in the state column that MySQL is "sorting results" which appears to take up some time. Here is a typical query: Select DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate', MsgTime, MsgPriority, MsgHostname, MsgText From syslogd WHERE MATCH (MsgText) AGAINST ('RADIUS') Order By MsgDate Desc LIMIT 200 Here an admin wants to find logs about a RADIUS server. The index on MsgDate is in ascending order, I guess I could make it descending? > Regards, Jigal. Thanks for any help, Jim Drabb -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. -- Thomas Jefferson; 1813 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- James Drabb JR Senior Programmer Analyst Darden Restaurants Business Systems JDrabb at Darden dot com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: FULLTEXT and large database
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 25, 2004 8:52 AM To: [EMAIL PROTECTED] Cc: James Drabb; [EMAIL PROTECTED] Subject: Re: FULLTEXT and large database > One thing that ocurs to me is that, if you are using it for syslog type > data, the data being logged will have a relatively samll "vocabulary". > No tesxt-based indexing system will handle this very well. Like looking > up a very common name in a telephone directory, you will get a lot of > hits which will then have to be searched linearly for the right one. And > linear search of meven a subset of a buig database is bad news, because > the records will be physically scattered and not amenable to caching. > > *If* it is true that your data is highly repetitive, could you arrange > to factor out the repetitive bits? For example, if there ate only 20 > record types can you parse out the type and convert it to an integer? > Alec The syslog app is proprietary/closed source which I cannot change. The main column that admins are interested in is MsgText text which has specific error/info/warning messages such as: %PIX-4-106023: Deny udp src inside:10.1.5.242/137 dst core:146.217.142.101/137 by access-group "inside_in" May 25 09:59:40 trend.darden.com MSWinEventLog<009>0<009>Security<009> 14445084<009>Tue May 25 09:59:34 2004<009>593<009>Security<009> IUSR_TREND<009>User<009>Success Audit<009>TREND<009>Detailed Tracking <009><009>A process has exited: Process ID: 32036 Image File Name: C:OfficescanPCCSRVWebCGIcgiRqHotFix.exe User Name: IUSR_TREND Domain: TREND Logon ID: (XX)<009>418997 Thanks again, Jim Drabb -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. -- Thomas Jefferson; 1813 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- James Drabb JR Senior Programmer Analyst Darden Restaurants Business Systems JDrabb at Darden dot com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: FULLTEXT and large database
-Original Message- From: Brent Baisley [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 25, 2004 9:50 AM To: James Drabb Cc: Jigal van Hemert; [EMAIL PROTECTED] Subject: Re: FULLTEXT and large database > You may try increasing your sort_buffer_size and key_buffer_size, > see if that speeds up the searching and sorting. I increased key_buffer_size to 768M and sort_buffer_size to 256M and it seems to have helped. For example this query on 3.1 million rows returned in 0.02 seconds: Select DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate', MsgTime, MsgPriority, MsgHostname, MsgText >From syslogd WHERE MATCH (MsgText) AGAINST ('LINK-CLUSTER_MEMBER_1-3-UPDOWN') LIMIT 200 However, I have noticed something weird with a FULLTEXT search. This query: Select DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate', MsgTime, MsgPriority, MsgHostname, MsgText >From syslogd WHERE MsgPriority = 'Local7.Notice' Order By MsgDate Desc LIMIT 200 Returns several rows and in the MsgText field (which has the FULLTEXT index) for one of the rows is this text: 12822: May 25 09:14:10.400 EST: %LINEPROTO-CLUSTER_MEMBER_1-5-UPDOWN: Line protocol on Interface FastEthernet0/11, changed state to down Now if I run this query to search for something in that field, I get no records: Select DATE_FORMAT(MsgDate, '%c/%d/%Y') AS 'MsgDate', MsgTime, MsgPriority, MsgHostname, MsgText >From syslogd WHERE MATCH (MsgText) AGAINST ('May') Order By MsgDate Desc LIMIT 200 The text 'May' is in the column, yet I get no rows returned? The admins are looking to be able to search for an arbitrary string in the MsgText field that can be anywhere in the field not just at the beginning. So it seems that a regular index will not work, yet a FULLTEXT index doesn't seem to be getting all the needed data. Thanks for the tip, Jim Drabb -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. -- Thomas Jefferson; 1813 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- James Drabb JR Senior Programmer Analyst Darden Restaurants Business Systems JDrabb at Darden dot com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: FULLTEXT and large database
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Harald Fuchs Sent: Tuesday, May 25, 2004 11:18 AM To: James Drabb Subject: Re: FULLTEXT and large database > > I seem to get inconsistent results when using the FULLTEXT index > > speed-wise on 3 million+ records. Here is a query that has been > > in the "Sorting results" stage for 200 seconds. There is a > > LIMIT clause of 200, so it cannot take MySQL 200 seconds to > > sort 200 records. > You've got that wrong. MySQL has to sort *the entire result set* > (before LIMITing) in order to find the 200 most recent entries. > > Thus my question how many rows the MATCH would return without the LIMIT. O!. Well then it would be a few hundred thousand to possibly a few million. I cannot really remove any data since the admins would like 30 days worth And right now there is only about 3 days worth. This project is still in pilot, so maybe we may need to limit to 14 days of data or so. Jim Drabb -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. -- Thomas Jefferson; 1813 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- James Drabb JR Senior Programmer Analyst Darden Restaurants Business Systems JDrabb at Darden dot com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Unable to Programatically Create DB Connection
-Original Message- From: Scott D. Spiegler [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 25, 2004 11:37 AM To: [EMAIL PROTECTED] Subject: Unable to Programatically Create DB Connection > I am trying to programatically connect to my DB, but I am not sure what > the connection string should be. I am using this statement: > > conn = > DriverManager.getConnection("jdbc:mysql://localhost/test?user=scott&pass word=cuatro"); > > I am getting this exception message: > > database_test.DBConnector > SQLException: No suitable driver > > SQLState: 08001 > > VendorError: 0 > > Any idea as to what the correct, connection string might be? > > Thanks, Scott Did you load the driver with Class.forName? Here is an example: import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; // Notice, do not import com.mysql.jdbc.* // or you will have problems! public class LoadDriver { public static void main(String[] args) { try { // The newInstance() call is a work around for some // broken Java implementations Class.forName("com.mysql.jdbc.Driver").newInstance(); } catch (Exception ex) { // handle the error } } This is all in the ConnectorJ docs at: http://dev.mysql.com/doc/connector/j/en/#id2801034 Jim Drabb -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. -- Thomas Jefferson; 1813 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =- James Drabb JR Senior Programmer Analyst Darden Restaurants Business Systems JDrabb at Darden dot com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]