Quoting Tom Cunningham <[EMAIL PROTECTED]>:
I think what Harald is saying (& if he's not, then I say this): You could have an alternative table structure like this: - it should make queries much quicker:
create table raddata_2004_10_ONE ( granID integer not null, scanID tinyint unsigned not null, fpID tinyint not null, c1 float not null, ... c500 float not null, )
create table raddata_2004_10_TWO ( granID integer not null, scanID tinyint unsigned not null, fpID tinyint not null, c501 float not null, ... c1000 float not null, )
But maybe with a more logical way of partitioning the columns among the different tables.
Yeah, we looked at doing something like this. It would make the tables smaller,
though queries would be much more painful. Unfortunately, there's not really
any common pattern for access to the various detector values. Each of the
people analyzing the data has a different set that they like to work with,
depending on the information they're trying to extract. The merge tables would
also require MASSIVE numbers of filehandles, since it would add another 5x to
the eventual number of tables (all of the tables are aggregated over time via
merge tables for the users).
Or, another option, perhaps you've thought of, have a table like this: should make indexing much worse, but querying much better.
create table raddata_2004_10_TWO ( granID integer not null, scanID tinyint unsigned not null, fpID tinyint not null, cID smallint not null, cValue float not null, primary key (granID, scanID, fpID, cID) index (granID, cID) )
OLD QUERY: Select c1 from raddata_2004_10 where granID between 147568 and 150000
NEW QUERY: Select cValue from raddata_2004_10 where granID between 147568 and 150000 and cID=1; (should be v. fast)
Interesting :) It would make reads a lot faster, but I see a couple of
downsides. First, it means that I'm duplicating the 6 bytes worth of ID fields
(plus the row/index overhead) a LOT more -- the main problem with this setup is
the volume of data that I'm trying to address. With 2500 columns, I get 12150
rows per granule, or just short of 3 million rows a day. With the structure
you suggested, I'd get 7.29 billion rows a day... wonder how the mysql engine
would handle 235 Billion rows per month in a table. Add the extra overhead,
and I don't think we could manage the storage requirements (it works out to
just under 10TB a year as it is now, with the other tables) -- works out to
2.625 TB a month for just the one table this way, unfortunately.
---> incidentally: I have a question: when you have a table like this with a primary key which has a lot of columns, is there any performance benefit to adding a new primary key, as an auto-increment column, & keeping the old primary key as a unique index? I thought maybe there might be some addressing / hashing issues which worked out quicker?
Interesting idea. Not sure what the gain would be, at least in this case,
however. Most of the queries are based on channel values, or other data like
geolocation or time, and just related to this table by the 3 id fields. I'd be
willing to give it a shot, but not sure what an autoincrement field would gain
when the queries aren't based on insert-order.
Thanks for the insight! Appreciate all the suggestions that you guys are throwing into the hat!
ken =========================================================================== "Diplomacy is the weapon of the Civilized Warrior" - Hun, A.T.
Ken Gieselman [EMAIL PROTECTED] System Administrator http://www.endlessknot.com/~kgieselm Endlessknot Communications http://www.endlessknot.com ===========================================================================
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]