RE: Partial Row Reads?

Ken Gieselman Thu, 21 Oct 2004 14:06:24 -0700

Quoting [EMAIL PROTECTED]:


Look at some my.cnf options. You can tell mysql to use keys more often the
table scans with a var called max_seeks_keys=100 // something like that

Definitely. In fact, that's not really the issue at hand, since max_seeks_for_key is already set to 1000 here. Shawn hit the nail right on the head for the problem, below.

I don't think that he is worried about table scanning, he is worried about
ROW scanning.  Each of his rows is so large (2500*(size of float) +
3*(size of tinyint) + some other stuff) that just moving that much data
around through his machine is consuming too much time.


In the example case I posted earlier:

'Select c1 from raddata_2004_10 where granID between 147568 and 150000'

resolves to returning one column from each of 29548800 rows. Since the users are actually querying against merge tables of the monthly data sets, the problem is compounded quickly. We are currently working with 2 years of data, with another 3-5 expected in due time, and more if we get lucky with instrument lifetime. Doing full row scans, even for queries where the rows are really limited with the WHERE constraints, is time prohibitive.


My idea is to create a way to ask the engine check the value of the second
field directly from the DISK copy of each table (without moving each row
into local memory) . If it matches a second seek is performed to pull in
the field(s) specified in his select .  Alternatively we could create some
way that we can ask the engine to only pull those columns through memory
that either participate in the SELECT clause or one of the other ON or
WHERE conditions.  This way we minimize how much data must be moved
through memory to resolve one of these rather bulky queries.

We could potentially create a new threshold value, say... if you use less
than 10% of the fields on a table in a query , that would cause this
optimization to kick in.

If anyone else has any ideas (and I am sure there are many) on how to
minimize disk traffic and memory throughput in order to handle his rather
large dataset, this would be a great time to speak up. I know that this is
a rather extreme case but solving this issue may make MySQL just that much
faster for the rest of us.

Sounds like a great approach. Ideally, with fixed-format rows, any given column should be readily addressable with a couple of quick calculations. Assuming that the indexes provide an offset to the matching row(s), the column offset should be simple addition from there, just the total of the lengths of the previous columns (plus null flags and other formatting stuff, if those are interspersed with the column data in the record). Row fragmentation would confuse the issue a bit more, but according to the manual, fixed-format rows shouldn't be subject to that.

This is definitely an edge case =) One of the reasons we're experimenting with a database solution is that the raw data files are already too much to manage efficiently, particularly when it comes to subsetting the data based on individual location, time, or detector values.

Thanks for all the input!  I appreciate it deeply.

ken
===========================================================================
              "Diplomacy is the weapon of the Civilized Warrior"
                                    - Hun, A.T.

Ken Gieselman                                      [EMAIL PROTECTED]
System Administrator                   http://www.endlessknot.com/~kgieselm
Endlessknot Communications                       http://www.endlessknot.com
===========================================================================

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: Partial Row Reads?

Reply via email to