I tried using
<SkipNonContributingSources>true</SkipNonContributingSources> on this
dataset with GDAL 3.9.3, and it drastically sped things up.


Then I tested it on another larger dataset, and the command errored out.
Then I built GDAL from source in a Docker container including Even's patch,
and the previously failing command succeeded. Memory usage was also within
a reasonable range—so everything is now resolved.


I posted this as an answer to my StackExchange question (mentioned earlier
in this email thread) in case others run into the same issue:
https://gis.stackexchange.com/a/491960/142232

Thanks for the quick fix Even! *My only remaining concern is that the
documentation lists SkipNonContributingSources under Python Pixel
Functions, even though it applies to both C++ and Python Pixel Functions.
Shouldn't it be elevated to be mentioned more generically under Derived
Band? *If agreed please let me know and I can attempt updating
documentation.

Thanks for your help.

Abdul Raheem Siddiqui
abdulraheemsiddi...@gmail.com


On Mon, Apr 14, 2025, 11:05 AM Rahkonen Jukka <
jukka.rahko...@maanmittauslaitos.fi> wrote:

> Hi,
>
> I got interested in trying what if all the images overlap totally. The
> timings from my test looks good to me, 28 seconds for creating a max raster
> from 1000 single band rasters, 1000x1000 pixels each.  However, I am not
> totally sure if my test is valid so I present it here.
>
> First create 1000 images with gdal_create with different pixel values
>
> for /L %n in (1,1,1000) do gdal_create -outsize 1000 1000 -of gtiff -ot
> int16 -burn %n -a_srs epsg:4326 -a_ullr 20 30 30 20 %n.tif
>
> Then create a prototype of a VRT
>
> gdalbuildvrt alloverlap_max.vrt *.tif
>
> Edit the VRT in a few places:
> - Change SimpleSource into ComplexSource everywhere
> - Add subClass="VRTDerivedRasterBand" and the pixel function:
> <VRTRasterBand dataType="Int16" band="1" subClass="VRTDerivedRasterBand">
> <PixelFunctionType>max</PixelFunctionType>
>
> Finally, convert into GeoTIFF
>
> gdal_translate -of gtiff alloverlap_max.vrt max.tif
>
> The result looks correct, the max value of max.tif is 1000 even the last
> image in the VRT is 999.tif with pixel values=999, and I tried also min
> function and got 1 as expected.
>
> -Jukka Rahkonen-
>
>
> ________________________________________
> Lähettäjä: gdal-dev käyttäjän Even Rouault via gdal-dev puolesta
> Lähetetty: Maanantai 14. huhtikuuta 2025 16.45
> Vastaanottaja: Abdul Raheem Siddiqui; gdal-dev@lists.osgeo.org
> Aihe: Re: [gdal-dev] Performance Issue with VRT Pixel Function and Large
> Number of Source Rasters
>
> Abdul,if you add
> <SkipNonContributingSources>true</SkipNonContributingSources> as a child
> element of the <VRTRasterBand> element, and apply patch
> https://github.com/OSGeo/gdal/commit/3dbc60b334ee022f2993dca476b08d5fed01698c
> , "gdal_translate -of GTiff merged.vrt OUTPUT.tif" completes in a few
> minutesCf
> https://gdal.org/en/stable/drivers/raster/vrt.html#using-derived-bands-with-pixel-functions-in-python
> for the doc of SkipNonContributingSourcesEvenLe 14/04/2025 à 06:52, Abdul
> Raheem Siddiqui via gdal-dev a écrit :Dear GDAL Community,I am encountering
> a performance issue when using a VRT consisting of a large number of
> source rasters and built-in C++ pixel function ("max"). I would appreciate
> any guidance on whether some GDAL config option can improve this, or I am
> doing something wrong, or this is a potential optimization opportunity.I
> have A VRT file referencing ~750 individual rasters (Byte data type, avg
> size ~1000x1000 pixels, untiled, and same CRS for all source rasters). The
> VRT uses the built-in “max” pixel function.Running gdal_translate to
> convert the VRT to GTiff takes ~1.5 hours and consumes ~4GB
> RAM.gdal_translate -of GTiff merged.vrt OUTPUT.tifWhen tripling the number
> of source rasters (to ~2250) by duplicating entries in the VRT, processing
> time increases to ~4.5 hours, with RAM usage rising to
> ~11.5GB.gdal_translate -of GTiff merged_3x.vrt OUTPUT.tifExtracting a small
> subset of the VRT via -projwin does not improve performance (still slow,
> ~14GB RAM used).gdal_translate -projwin -146955.241 797044.4497
> -138766.1444 789656.0648 -of GTiff merged_3x.vrt OUTPUT.tifRemoving the
> pixel function makes processing instantaneous, even with 2250
> rasters.Performance is unaffected by --config GDAL_CACHEMAX or -co
> NUM_THREADS=ALL_CPUS.The issue persists across other datasets that I have.
> In fact, it gets really worse when source rasters are relatively larger or
> of Float32 data type.Gdalinfo on one of the source rasters:
> https://pastebin.com/gjHDA2WdData:
> https://drive.google.com/file/d/1LGlzGGZvkPyXvKKgkPVzQGbBw55p5dRd/view?System:
> Windows, 8 CPUs, 32GB RAM (GDAL 3.9.2 via OSGeo4W).Thank you for your time
> and insights. Please reply if you are aware of how performance can be
> improved.Regards, Abdul Siddiqui, PEabdul.siddiqui@ertcorp.comERT
> | Earth Resources Technology, Inc.14401 Sweitzer Ln. Ste 300Laurel, MD 20707
> https://www.ertcorp.com _______________________________________________
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to