Quoting [EMAIL PROTECTED]:
Look at some my.cnf options. You can tell mysql to use keys more often the table scans with a var called max_seeks_keys=100 // something like that
Definitely. In fact, that's not really the issue at hand, since
max_seeks_for_key is already set to 1000 here. Shawn hit the nail right on the
head for the problem, below.
I don't think that he is worried about table scanning, he is worried about ROW scanning. Each of his rows is so large (2500*(size of float) + 3*(size of tinyint) + some other stuff) that just moving that much data around through his machine is consuming too much time.
In the example case I posted earlier:
'Select c1 from raddata_2004_10 where granID between 147568 and 150000'
resolves to returning one column from each of 29548800 rows. Since the users
are actually querying against merge tables of the monthly data sets, the
problem is compounded quickly. We are currently working with 2 years of data,
with another 3-5 expected in due time, and more if we get lucky with instrument
lifetime. Doing full row scans, even for queries where the rows are really
limited with the WHERE constraints, is time prohibitive.
My idea is to create a way to ask the engine check the value of the second field directly from the DISK copy of each table (without moving each row into local memory) . If it matches a second seek is performed to pull in the field(s) specified in his select . Alternatively we could create some way that we can ask the engine to only pull those columns through memory that either participate in the SELECT clause or one of the other ON or WHERE conditions. This way we minimize how much data must be moved through memory to resolve one of these rather bulky queries.
We could potentially create a new threshold value, say... if you use less than 10% of the fields on a table in a query , that would cause this optimization to kick in.
If anyone else has any ideas (and I am sure there are many) on how to minimize disk traffic and memory throughput in order to handle his rather large dataset, this would be a great time to speak up. I know that this is a rather extreme case but solving this issue may make MySQL just that much faster for the rest of us.
Sounds like a great approach. Ideally, with fixed-format rows, any given column
should be readily addressable with a couple of quick calculations. Assuming
that the indexes provide an offset to the matching row(s), the column offset
should be simple addition from there, just the total of the lengths of the
previous columns (plus null flags and other formatting stuff, if those are
interspersed with the column data in the record). Row fragmentation would
confuse the issue a bit more, but according to the manual, fixed-format rows
shouldn't be subject to that.
This is definitely an edge case =) One of the reasons we're experimenting with
a database solution is that the raw data files are already too much to manage
efficiently, particularly when it comes to subsetting the data based on
individual location, time, or detector values.
Thanks for all the input! I appreciate it deeply.
ken =========================================================================== "Diplomacy is the weapon of the Civilized Warrior" - Hun, A.T.
Ken Gieselman [EMAIL PROTECTED] System Administrator http://www.endlessknot.com/~kgieselm Endlessknot Communications http://www.endlessknot.com ===========================================================================
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]