Re: Partial Row Reads?

Harrison Thu, 21 Oct 2004 17:33:16 -0700

Hi,

On Thursday, October 21, 2004, at 04:40  PM, [EMAIL PROTECTED] wrote:

I don't think that he is worried about table scanning, he is worried about ROW scanning. Each of his rows is so large (2500*(size of float) + 3*(size of tinyint) + some other stuff) that just moving that much data around through his machine is consuming too much time.
If you have a query that does something like this
SELECT onefield
FROM hugetable
WHERE indexed_column = 12
        AND non_indexed = 6
Odds are that you will be doing an indexed search. Say that for his indexed_column he gets 24M rows that match its condition out of the 16B rows on his table (this is way less than 30%). The next thing the engine has to do is to LOAD each of those 24M rows (every single column) into memory so that the value of the non_indexed column can be compared to 6. In order to perform that second comparison, the memory bus, the hard drives, and anything else related to reading records will have to transfer 275.4 GB of data (at least once) just so that he can get the value from the 1 column he specified in his SELECT statement out of each row that matches his two WHERE conditions.

My idea is to create a way to ask the engine check the value of the second field directly from the DISK copy of each table (without moving each row into local memory) . If it matches a second seek is performed to pull in the field(s) specified in his select . Alternatively we could create some way that we can ask the engine to only pull those columns through memory that either participate in the SELECT clause or one of the other ON or WHERE conditions. This way we minimize how much data must be moved through memory to resolve one of these rather bulky queries.

We could potentially create a new threshold value, say... if you use less than 10% of the fields on a table in a query , that would cause this optimization to kick in.

If anyone else has any ideas (and I am sure there are many) on how to minimize disk traffic and memory throughput in order to handle his rather large dataset, this would be a great time to speak up. I know that this is a rather extreme case but solving this issue may make MySQL just that much faster for the rest of us.

The problem with the above is that you will increase disk seeks by one per row per column. A disk seek is much more expensive than a continuous read. The only case where I could see this optimization would be useful was if you had a single column being read on a fixed width table (then you could compute the initial seek offset, without adding any extra ones). With 2 columns (which is your above example), it would be faster to do 1 disk seek + read the entire row (10k in this case, with 2500 float fields) than doing 2 disk seeks and reading 8 bytes (2 columns). Moving data thru CPU and memory is trivial in cost compared to doing an extra disk seek.

Regards,

Harrison


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Partial Row Reads?

Reply via email to