Great discussion about improving the raster processing. As opposed to the ideas above, of incorporating new machinery into GDAL, it might also interesting to consider changing the API (a bit) so GDAL integrates better with existing tools.
Within the Python ecosystem for example, there have recently emerged some great tools for serious data processing. A combination of Dask (inc Dask.distributed) and Numba for example allow relative easy scaling to multicore (including GPU) and to clusters. Dask easily handles windowed functions as discussed above, but also much more complicated relations between 'blocks' for example with linear algebra, matrix multiplication etc. Here is a small experiment i made a while back, to test if i could use GDAL for the I/O to populate Dask arrays. It computes the slope, a windowed function, from an elevation map: http://nbviewer.jupyter.org/gist/RutgerK/a2ad8c074c78d000dd4a1e35cc229dee You basically only need a Python '__getitem__' interface so a Dataset can be indexed similar as a Numpy array, and some attributes like the shape of the data. That's much easier to implement than handling and scheduling blocks yourself. The downside is of course that you rely on third partys, and you get a different experience based on the programming language you use because the availability of packages/modules will be different. My example uses Python and therefore would be useless to users of other languages. Btw, has anyone thought of using vectorized Numba functions as GDAL pixel functions? I have no clue whether that would technically work or even make sense. Perhaps it would be a lot easier, and faster, then using Python derived functions. Regards, Rutger -- View this message in context: http://osgeo-org.1560.x6.nabble.com/GDAL-raster-processing-library-tp5275948p5280032.html Sent from the GDAL - Dev mailing list archive at Nabble.com. _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
