Hi Even, This is a really cool and really impressive feature! Calling Python code from C++ without development packages as a dependency sounds like black magic to me. Obviously Python symbols have to be there at some point to execute Python code, so this is only usable from a binary that happens to load them already (CPython, QGIS, etc.), correct? In particular you say that it is "done at run-time by dynamically loading Python symbols". But I'm confused because later you mention incompatibilities issues between the CPython and Pypy API. GDAL's secret sauce, I guess...? I'm also curious why it's possible to use numba but no pypy, which AFAIK are both python JITs? And finally did you consider using Cython (which claims pypy compatibility)?
Cheers, Victor Poughon ________________________________________ De : gdal-dev [[email protected]] de la part de Even Rouault [[email protected]] Envoyé : lundi 12 septembre 2016 14:31 À : [email protected] Objet : [gdal-dev] VRT derived band pixel functions written in Python Hi, I wanted to mention a new (and I think pretty cool ) feature I've added in trunk: the possibility to define pixel functions in Python in a VRT derived band. Let's start with a simple example to multiply a raster by 1.5 : <VRTDataset rasterXSize="20" rasterYSize="20"> <SRS>EPSG:26711</SRS> <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionType>multiply</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionArguments factor="1.5"/> <PixelFunctionCode><![CDATA[ import numpy as np def multiply(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): factor = float(kwargs['factor']) out_ar[:] = np.round_(np.clip(in_ar[0] * factor,0,255)) ]]> </PixelFunctionCode> <SimpleSource> <SourceFilename relativeToVRT="1">byte.tif</SourceFilename> </SimpleSource> </VRTRasterBand> </VRTDataset> or to add 2 (or more) rasters: <VRTDataset rasterXSize="20" rasterYSize="20"> <SRS>EPSG:26711</SRS> <GeoTransform>440720,60,0,3751320,0,-60</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionType>add</PixelFunctionType> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionCode><![CDATA[ import numpy as np def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): np.round_(np.clip(np.sum(in_ar, axis = 0, dtype = 'uint16'),0,255), out = out_ar) ]]> </PixelFunctionCode> <SimpleSource> <SourceFilename relativeToVRT="1">byte.tif</SourceFilename> </SimpleSource> <SimpleSource> <SourceFilename relativeToVRT="1">byte2.tif</SourceFilename> </SimpleSource> </VRTRasterBand> </VRTDataset> You can put any python module code inside PixelFunctionCode, with at least one function with the following arguments : - in_ar: list of input numpy arrays (one numpy array for each source) - out_ar: output numpy array to fill (initialized at the right dimensions and with the VRTRasterBand.dataType) - xoff: pixel offset to the top left corner of the accessed region of the band - yoff: line offset to the top left corner of the accessed region of the band - xsize: width of the region of the accessed region of the band - ysize: height of the region of the accessed region of the band - raster_xsize: total with of the raster band - raster_ysize: total height of the raster band - buf_radius: radius of the buffer (in pixels) added to the left, right, top and bottom of in_ar / out_ar - gt: geotransform - kwargs: dictionnary with user arguments defined in PixelFunctionArguments For basic operations, you just need to care about in_ar and out_ar. With all that, you can do interesting stuff like hillshading (code ported from gdaldem): <VRTDataset rasterXSize="121" rasterYSize="121"> <SRS>EPSG:4326</SRS> <GeoTransform>-80.004166666666663,0.008333333333333,0, 44.004166666666663,0,-0.008333333333333</GeoTransform> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <ColorInterp>Gray</ColorInterp> <SimpleSource> <SourceFilename relativeToVRT="1">n43.dt0</SourceFilename> </SimpleSource> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionType>hillshade</PixelFunctionType> <PixelFunctionArguments scale="111120" z_factor="30" /> <PixelFunctionCode> <![CDATA[ # Licence: X/MIT # Copyright 2016, Even Rouault import math def hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, z, scale): ovr_scale_x = float(out_ar.shape[1] - 2 * radius) / xsize ovr_scale_y = float(out_ar.shape[0] - 2 * radius) / ysize ewres = gt[1] / ovr_scale_x nsres = gt[5] / ovr_scale_y inv_nsres = 1.0 / nsres inv_ewres = 1.0 / ewres az = 315 alt = 45 degreesToRadians = math.pi / 180 sin_alt = math.sin(alt * degreesToRadians) azRadians = az * degreesToRadians z_scale_factor = z / (8 * scale) cos_alt_mul_z_scale_factor = \ math.cos(alt * degreesToRadians) * z_scale_factor cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \ 254 * math.cos(azRadians) * cos_alt_mul_z_scale_factor sin_az_mul_cos_alt_mul_z_scale_factor_mul_254 = \ 254 * math.sin(azRadians) * cos_alt_mul_z_scale_factor square_z_scale_factor = z_scale_factor * z_scale_factor sin_alt_mul_254 = 254.0 * sin_alt for j in range(radius, out_ar.shape[0]-radius): win_line = in_ar[0][j-radius:j+radius+1,:] for i in range(radius, out_ar.shape[1]-radius): win = win_line[:,i-radius:i+radius+1].tolist() x = inv_ewres * ((win[0][0] + win[1][0] + win[1][0] + win[2][0])-\ (win[0][2] + win[1][2] + win[1][2] + win[2][2])) y = inv_nsres * ((win[2][0] + win[2][1] + win[2][1] + win[2][2])-\ (win[0][0] + win[0][1] + win[0][1] + win[0][2])) xx_plus_yy = x * x + y * y cang_mul_254 = (sin_alt_mul_254 - \ (y * cos_az_mul_cos_alt_mul_z_scale_factor_mul_254 - \ x * sin_az_mul_cos_alt_mul_z_scale_factor_mul_254)) / \ math.sqrt(1 + square_z_scale_factor * xx_plus_yy) if cang_mul_254 < 0: out_ar[j,i] = 1 else: out_ar[j,i] = 1 + round(cang_mul_254) def hillshade(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, **kwargs): z = float(kwargs['z_factor']) scale= float(kwargs['scale']) hillshade_int(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, radius, gt, z, scale) ]]> </PixelFunctionCode> <BufferRadius>1</BufferRadius> <SourceTransferType>Int16</SourceTransferType> </VRTRasterBand> </VRTDataset> You can completely offload the python code itself into a proper my_lib.py file and just specify <PixelFunctionType>my_lib.hillshade</PixelFunctionType> Technically, the interfacing with Python is done at run-time by dynamically loading Python symbols with dlopen()/GetProcAddress(), when they are already available in the process. For example if libgdal is loaded from a Python interpreter, or from a binary like QGIS which has already loaded the Python lib. Otherwise a few shared objects ("libpython2.7.so", "python27.dll", "libpython3.4m.so", etc.) are tried, unless the PYTHONSO config option is specified to point to a precise filename. The advantage of this approach is that the same GDAL library binary is compatible of all Python 2.X (tested: 2.6, 2.7) and 3.X (tested: 3.1, 3.4) versions, and any numpy version that comes in the Python environment used (at compilation time you don't need any python/numpy development package). The numpy dependency is not a critical one: one could imagine a fallback mode where Python arrays would be used instead, but this has likely little interest. Successfully tested on Linux, MacOSX, FreeBSD and Windows. Some extra tweaking of the predefined set of shared object names - that are probed when no already loaded Python environment is found - might be needed. The implementation should be thread-safe regarding use of the Python Global Interpreter Lock (GIL). I've also tested with the numba Just-In-Time compiler (http://numba.pydata.org/) and it provides major performance improvements for highly computational code (example given below). With numba enabled, I found that my above Python hillshading on a 10Kx10K float32 raster was even faster than gdaldem (the reason is that the Python version is a bit simplified as it doesn't take into account input nodata values), whereas the non-jit'ed one is 100x slower. When removing the nodata flag, gdaldem is only twice faster as the jit'ed python code. So this is a good sign that such approach isn't only a toy or just for prototyping. Speaking of JIT, there's no provision (yet?) for interfacing with PyPy. Would require a new backend as PyPy C API has nothing to do with the CPython one. There are obvious security concerns in allowing Python code to be run when getting the content of a vrt file. The GDAL_VRT_ENABLE_PYTHON config option = IF_SAFE / NO / YES can be set to control the behaviour. The default is IF_SAFE (can be change at compilation time by defining -DGDAL_VRT_ENABLE_PYTHON_DEFAULT="NO" e.g. And Python code execution can be completely disabled with -DGDAL_VRT_DISABLE_PYTHON). Safe must be understood as: the code will not read, write, delete... files, spawn external code, do network activity, etc. Said otherwise, the code will only do "maths". But infinite looping is something definitely possible in the safe mode. The heuristics of the IF_SAFE mode is rather basic and I'd be grateful if people could point ways of breaking it. If any of the following strings - "import" (unless it is "import numpy" / "from numpy import ...", "import math" / "from math import ..." or "from numba import jit"), "eval", "compile", "open", "load", "file", "input", "save", "memmap", "DataSource", "genfromtxt", "getattr", "ctypeslib", "testing", "dump", "fromregex" - is found anywhere in the code, then the code is considered unsafe (there are interestingly a lot of methods in numpy to do file I/O. Hopefully I've captured them with the previous filters). Another 'unsafe' pattern is when the pixel function references an external module like my above my_lib.hillshade example (who knows if there will not be some day a shutil.reformat_your_hard_drive function with the right prototype...) This new capability isn't yet documented in the VRT doc, although this message will be a start. I'm interested in feedback you may have. And to conclude with a fun example: a raster with a Mandelbrot fractal. Just a grey-level version. Let to the reader as an exercice: add a color table. To be opened for example in QGIS and enjoy the almost infinite zoom feeling. Make sure to disable contrast enhancement. It uses numba when available, and when this is the case, it's really fast when paning/zooming. <VRTDataset rasterXSize="100000000" rasterYSize="100000000"> <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand"> <PixelFunctionLanguage>Python</PixelFunctionLanguage> <PixelFunctionType>mandelbrot</PixelFunctionType> <PixelFunctionCode> <![CDATA[ try: from numba import jit #print('Using numba') g_max_iterations = 100 except: class jit(object): def __init__(self, nopython = True, nogil = True): pass def __call__(self, f): return f #print('Using non-JIT version') g_max_iterations = 25 # Use a wrapper since the VRT code cannot access the jit decorated function def mandelbrot(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, r, gt, **kwargs): mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, g_max_iterations) @jit(nopython=True, nogil=True) def mandelbrot_jit(out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, max_iterations): ovr_factor_y = float(out_ar.shape[0]) / ysize ovr_factor_x = float(out_ar.shape[1]) / xsize for j in range( out_ar.shape[0]): y0 = 2.0 * (yoff + j / ovr_factor_y) / raster_ysize - 1 for i in range(out_ar.shape[1]): x0 = 3.5 * (xoff + i / ovr_factor_x) / raster_xsize - 2.5 x = 0.0 y = 0.0 x2 = 0.0 y2 = 0.0 iteration = 0 while x2 + y2 < 4 and iteration < max_iterations: y = 2*x*y + y0 x = x2 - y2 + x0 x2 = x * x y2 = y * y iteration += 1 out_ar[j][i] = iteration * 255 / max_iterations ]]> </PixelFunctionCode> <Metadata> <MDI key="STATISTICS_MAXIMUM">255</MDI> <MDI key="STATISTICS_MEAN">127</MDI> <MDI key="STATISTICS_MINIMUM">0</MDI> <MDI key="STATISTICS_STDDEV">127</MDI> </Metadata> <ColorInterp>Gray</ColorInterp> <Histograms> <HistItem> <HistMin>-0.5</HistMin> <HistMax>255.5</HistMax> <BucketCount>256</BucketCount> <IncludeOutOfRange>0</IncludeOutOfRange> <Approximate>1</Approximate> <HistCounts>0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0</HistCounts> </HistItem> </Histograms> </VRTRasterBand> </VRTDataset> Statistics have been added just to make QGIS open the file a bit quicker. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
