Re: [PERFORM] Atomic access to large arrays

Victor de Buen Wed, 22 Jul 2009 12:08:36 -0700

Thank a lot, Merlin.

I will try to fill a sample of grids in a new table with different sizes of
subgrids in order to get the better relation between space and speed.


Regards

2009/7/22 Merlin Moncure <[email protected]>

> On Tue, Jul 21, 2009 at 7:43 PM, Victor de Buen
> (Bayes)<[email protected]> wrote:
> > Hi
> >
> > I'm storing historical meteorological gridded data from GFS
> > (http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in
> a
> > table like this:
> >
> > CREATE TABLE grid_f_data_i2 (
> >   //Specifies the variable and other features of data
> >   id_inventory integer REFERENCES grid_d_inventory(id_inventory),
> >   //A new grid is available each 3 hours since 5 years ago
> >   dh_date timestamp,
> >   //Values are scaled to be stored as signed integers of 2 bytes
> >   vl_grid smallint[361][720],
> > CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
> >   (co_inventory, dh_date)
> > );
> >
> > Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920
> > cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So,
> > vl_grid[y][x] stores the value at dh_date of a meteorological variable
> > specified by id_inventory in the geodesic point
> >
> > latitude  = -90 + y*0.5
> > longitude = x*0.5
> >
> > The reverse formula for the closest point in the grid of an arbitary
> > geodesic point will be
> >
> > y = Round((latitude+90) * 2
> > x = Round(longitude*2)
> >
> > Field vl_grid is stored in the TOAST table and has a good compression
> level.
> > PostgreSql is the only one database that is able to store this huge
> amount
> > of data in only 34 GB of disk. It's really great system. Queries
> returning
> > big rectangular areas are very fast, but the target of almost all queries
> is
> > to get historical series for a geodesic point
> >
> > SELECT  dh_date, vl_grid[123][152]
> > FROM  grid_f_data_i2
> > WHERE  id_inventory = 6
> > ORDER BY dh_date
> >
> > In this case, atomic access to just a cell of each one of a only few
> > thousands of rows becomes too slow.
>
> That's a side effect of your use of arrays.  Arrays are very compact,
> and ideal if you always want the whole block of data at once, but
> asking for particular point is the down side of your trade off.  I
> would suggest maybe experimenting with smaller grid sizes...maybe
> divide your big grid into approximately 16 (4x4) separate subgrids.
> This should still 'toast', and give decent compression, but mitigate
> the impact of single point lookup somewhat.
>
> merlin
>



-- 
Víctor de Buen Remiro
Consultor estadístico
Bayes Forecast
www.bayesforecast.com
Tol Development Team member
www.tol-project.org

Re: [PERFORM] Atomic access to large arrays

Reply via email to