Thank a lot, Merlin. I will try to fill a sample of grids in a new table with different sizes of subgrids in order to get the better relation between space and speed.
Regards 2009/7/22 Merlin Moncure <mmonc...@gmail.com> > On Tue, Jul 21, 2009 at 7:43 PM, Victor de Buen > (Bayes)<vdeb...@bayesinf.com> wrote: > > Hi > > > > I'm storing historical meteorological gridded data from GFS > > (http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in > a > > table like this: > > > > CREATE TABLE grid_f_data_i2 ( > > //Specifies the variable and other features of data > > id_inventory integer REFERENCES grid_d_inventory(id_inventory), > > //A new grid is available each 3 hours since 5 years ago > > dh_date timestamp, > > //Values are scaled to be stored as signed integers of 2 bytes > > vl_grid smallint[361][720], > > CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY > > (co_inventory, dh_date) > > ); > > > > Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920 > > cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So, > > vl_grid[y][x] stores the value at dh_date of a meteorological variable > > specified by id_inventory in the geodesic point > > > > latitude = -90 + y*0.5 > > longitude = x*0.5 > > > > The reverse formula for the closest point in the grid of an arbitary > > geodesic point will be > > > > y = Round((latitude+90) * 2 > > x = Round(longitude*2) > > > > Field vl_grid is stored in the TOAST table and has a good compression > level. > > PostgreSql is the only one database that is able to store this huge > amount > > of data in only 34 GB of disk. It's really great system. Queries > returning > > big rectangular areas are very fast, but the target of almost all queries > is > > to get historical series for a geodesic point > > > > SELECT dh_date, vl_grid[123][152] > > FROM grid_f_data_i2 > > WHERE id_inventory = 6 > > ORDER BY dh_date > > > > In this case, atomic access to just a cell of each one of a only few > > thousands of rows becomes too slow. > > That's a side effect of your use of arrays. Arrays are very compact, > and ideal if you always want the whole block of data at once, but > asking for particular point is the down side of your trade off. I > would suggest maybe experimenting with smaller grid sizes...maybe > divide your big grid into approximately 16 (4x4) separate subgrids. > This should still 'toast', and give decent compression, but mitigate > the impact of single point lookup somewhat. > > merlin > -- VĂctor de Buen Remiro Consultor estadĂstico Bayes Forecast www.bayesforecast.com Tol Development Team member www.tol-project.org