Re: Partial Row Reads?

Ken Gieselman Mon, 25 Oct 2004 08:26:19 -0700

Quoting Tom Cunningham <[EMAIL PROTECTED]>:

I think what Harald is saying (& if he's not, then I say this):
You could have an alternative table structure like this: - it should
make queries much quicker:

create table raddata_2004_10_ONE (
 granID               integer not null,
 scanID               tinyint unsigned not null,
 fpID                 tinyint not null,
 c1                   float not null,
...
 c500                float not null,
)

create table raddata_2004_10_TWO (
 granID               integer not null,
 scanID               tinyint unsigned not null,
 fpID                 tinyint not null,
 c501                   float not null,
...
 c1000                float not null,
)

But maybe with a more logical way of partitioning the columns among
the different tables.

Yeah, we looked at doing something like this. It would make the tables smaller, though queries would be much more painful. Unfortunately, there's not really any common pattern for access to the various detector values. Each of the people analyzing the data has a different set that they like to work with, depending on the information they're trying to extract. The merge tables would also require MASSIVE numbers of filehandles, since it would add another 5x to the eventual number of tables (all of the tables are aggregated over time via merge tables for the users).


Or, another option, perhaps you've thought of, have a table like this:
should make indexing much worse, but querying much better.

create table raddata_2004_10_TWO (
 granID               integer not null,
 scanID               tinyint unsigned not null,
 fpID                 tinyint not null,
 cID                   smallint not null,
 cValue               float not null,
  primary key (granID, scanID, fpID, cID)
  index (granID, cID)
)

OLD QUERY:
Select c1 from raddata_2004_10 where granID between 147568 and 150000

NEW QUERY:
Select cValue from raddata_2004_10 where granID between 147568 and
150000 and cID=1;
(should be v. fast)

Interesting :) It would make reads a lot faster, but I see a couple of downsides. First, it means that I'm duplicating the 6 bytes worth of ID fields (plus the row/index overhead) a LOT more -- the main problem with this setup is the volume of data that I'm trying to address. With 2500 columns, I get 12150 rows per granule, or just short of 3 million rows a day. With the structure you suggested, I'd get 7.29 billion rows a day... wonder how the mysql engine would handle 235 Billion rows per month in a table. Add the extra overhead, and I don't think we could manage the storage requirements (it works out to just under 10TB a year as it is now, with the other tables) -- works out to 2.625 TB a month for just the one table this way, unfortunately.

---> incidentally: I have a question: when you have a table like this
with a primary key which has a lot of columns, is there any
performance benefit to adding a new primary key, as an auto-increment
column, & keeping the old primary key as a unique index? I thought
maybe there might be some addressing / hashing issues which worked out
quicker?

Interesting idea. Not sure what the gain would be, at least in this case, however. Most of the queries are based on channel values, or other data like geolocation or time, and just related to this table by the 3 id fields. I'd be willing to give it a shot, but not sure what an autoincrement field would gain when the queries aren't based on insert-order.

Thanks for the insight!  Appreciate all the suggestions that you guys are
throwing into the hat!

ken
===========================================================================
              "Diplomacy is the weapon of the Civilized Warrior"
                                    - Hun, A.T.

Ken Gieselman                                      [EMAIL PROTECTED]
System Administrator                   http://www.endlessknot.com/~kgieselm
Endlessknot Communications                       http://www.endlessknot.com
===========================================================================

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Partial Row Reads?

Reply via email to