Michael,
Scott is right.  Not sure if this is the preferred approach, but I accomplished 
this for large datasets by specifying buffer sizes for ReadAsArray.  The doc I 
consulted is here: 
http://gdal.org/python/osgeo.gdal_array-module.html#BandReadAsArray.  
I used masked arrays to exclude nodata values - you may not need to worry about 
with this.
-David

Excerpt from my script:

src_ds = gdal.Open(src_fn, gdal.GA_ReadOnly)
b = src_ds.GetRasterBand(1)
ndv = b.GetNoDataValue()
ns = src_ds.RasterXSize
nl = src_ds.RasterYSize

#Don't want to load the entire dataset for stats computation
#This is maximum dimension for reduced resolution array
max_dim = 1024.

scale_ns = ns/max_dim
scale_nl = nl/max_dim
scale_max = max(scale_ns, scale_nl)

if scale_max > 1:
    nl = round(nl/scale_max)
    ns = round(ns/scale_max)

#The buf_size parameters determine the final array dimensions
bm = numpy.ma.masked_equal(numpy.array(b.ReadAsArray(buf_xsize=ns, 
buf_ysize=nl)), ndv)


On Apr 11, 2012, at 11:17 AM, Scott Arko wrote:

> Hi Michael,
> 
> 
> I may be missing your question, but why aren't you just using ReadAsArray?  
> It has an option to return a smaller array from the input array.  Now, I'm 
> not sure how it does the resampling (you could look to see), but you can make 
> a call like
> 
> data = 
> banddata.ReadAsArray(0,0,filehandle.RasterXSize,filehandle.RasterYSize,xsize,ysize)
> 
> where xsize and ysize are smaller than the true RasterXSize or RasterYSize.  
> I haven't looked at this in a while, but I'm pretty sure this will work.  Did 
> I miss the point of what you were asking?
> 
> 
> Thanks,
> Scott
> 
> 
> On Wed, Apr 11, 2012 at 6:31 AM, K.-Michael Aye <[email protected]> 
> wrote:
> Dear all,
> 
> is there a Python API for downsampling a huge dataset?
> What I would like to do:
> 
> * get my dataset
> * read out RasterXSize and RasterYSize
> * calculate how many lines and rows I need to skip to get a quick overview 
> image, e.g. 10 lines to skip.
> * Have a ReadAsArray interface where I can say something like this:
> ** data = ds.ReadAsArray(xoffset, yoffset, 10000, 10000, skipping=10)
> 
> which in numpy terms would give me every 10nth line like this: array[:,:,10]
> 
> I really don't need quality at all, just speed, for a rough overview for 
> further zooming in with lassos, as the images I deal with sometimes have more 
> than 200 MPixels.
> 
> Is this possible in Python?
> I was thinking now, maybe one could use numpy's memmap somehow for this, 
> don't know much about it, though…
> 
> Thanks for any hints!
> 
> Best regards,
> Michael
> 
> 
> _______________________________________________
> gdal-dev mailing list
> [email protected]
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
> 
> 
> _______________________________________________
> gdal-dev mailing list
> [email protected]
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to