I tried using <SkipNonContributingSources>true</SkipNonContributingSources> on this dataset with GDAL 3.9.3, and it drastically sped things up.
Then I tested it on another larger dataset, and the command errored out. Then I built GDAL from source in a Docker container including Even's patch, and the previously failing command succeeded. Memory usage was also within a reasonable range—so everything is now resolved. I posted this as an answer to my StackExchange question (mentioned earlier in this email thread) in case others run into the same issue: https://gis.stackexchange.com/a/491960/142232 Thanks for the quick fix Even! *My only remaining concern is that the documentation lists SkipNonContributingSources under Python Pixel Functions, even though it applies to both C++ and Python Pixel Functions. Shouldn't it be elevated to be mentioned more generically under Derived Band? *If agreed please let me know and I can attempt updating documentation. Thanks for your help. Abdul Raheem Siddiqui abdulraheemsiddi...@gmail.com On Mon, Apr 14, 2025, 11:05 AM Rahkonen Jukka < jukka.rahko...@maanmittauslaitos.fi> wrote: > Hi, > > I got interested in trying what if all the images overlap totally. The > timings from my test looks good to me, 28 seconds for creating a max raster > from 1000 single band rasters, 1000x1000 pixels each. However, I am not > totally sure if my test is valid so I present it here. > > First create 1000 images with gdal_create with different pixel values > > for /L %n in (1,1,1000) do gdal_create -outsize 1000 1000 -of gtiff -ot > int16 -burn %n -a_srs epsg:4326 -a_ullr 20 30 30 20 %n.tif > > Then create a prototype of a VRT > > gdalbuildvrt alloverlap_max.vrt *.tif > > Edit the VRT in a few places: > - Change SimpleSource into ComplexSource everywhere > - Add subClass="VRTDerivedRasterBand" and the pixel function: > <VRTRasterBand dataType="Int16" band="1" subClass="VRTDerivedRasterBand"> > <PixelFunctionType>max</PixelFunctionType> > > Finally, convert into GeoTIFF > > gdal_translate -of gtiff alloverlap_max.vrt max.tif > > The result looks correct, the max value of max.tif is 1000 even the last > image in the VRT is 999.tif with pixel values=999, and I tried also min > function and got 1 as expected. > > -Jukka Rahkonen- > > > ________________________________________ > Lähettäjä: gdal-dev käyttäjän Even Rouault via gdal-dev puolesta > Lähetetty: Maanantai 14. huhtikuuta 2025 16.45 > Vastaanottaja: Abdul Raheem Siddiqui; gdal-dev@lists.osgeo.org > Aihe: Re: [gdal-dev] Performance Issue with VRT Pixel Function and Large > Number of Source Rasters > > Abdul,if you add > <SkipNonContributingSources>true</SkipNonContributingSources> as a child > element of the <VRTRasterBand> element, and apply patch > https://github.com/OSGeo/gdal/commit/3dbc60b334ee022f2993dca476b08d5fed01698c > , "gdal_translate -of GTiff merged.vrt OUTPUT.tif" completes in a few > minutesCf > https://gdal.org/en/stable/drivers/raster/vrt.html#using-derived-bands-with-pixel-functions-in-python > for the doc of SkipNonContributingSourcesEvenLe 14/04/2025 à 06:52, Abdul > Raheem Siddiqui via gdal-dev a écrit :Dear GDAL Community,I am encountering > a performance issue when using a VRT consisting of a large number of > source rasters and built-in C++ pixel function ("max"). I would appreciate > any guidance on whether some GDAL config option can improve this, or I am > doing something wrong, or this is a potential optimization opportunity.I > have A VRT file referencing ~750 individual rasters (Byte data type, avg > size ~1000x1000 pixels, untiled, and same CRS for all source rasters). The > VRT uses the built-in “max” pixel function.Running gdal_translate to > convert the VRT to GTiff takes ~1.5 hours and consumes ~4GB > RAM.gdal_translate -of GTiff merged.vrt OUTPUT.tifWhen tripling the number > of source rasters (to ~2250) by duplicating entries in the VRT, processing > time increases to ~4.5 hours, with RAM usage rising to > ~11.5GB.gdal_translate -of GTiff merged_3x.vrt OUTPUT.tifExtracting a small > subset of the VRT via -projwin does not improve performance (still slow, > ~14GB RAM used).gdal_translate -projwin -146955.241 797044.4497 > -138766.1444 789656.0648 -of GTiff merged_3x.vrt OUTPUT.tifRemoving the > pixel function makes processing instantaneous, even with 2250 > rasters.Performance is unaffected by --config GDAL_CACHEMAX or -co > NUM_THREADS=ALL_CPUS.The issue persists across other datasets that I have. > In fact, it gets really worse when source rasters are relatively larger or > of Float32 data type.Gdalinfo on one of the source rasters: > https://pastebin.com/gjHDA2WdData: > https://drive.google.com/file/d/1LGlzGGZvkPyXvKKgkPVzQGbBw55p5dRd/view?System: > Windows, 8 CPUs, 32GB RAM (GDAL 3.9.2 via OSGeo4W).Thank you for your time > and insights. Please reply if you are aware of how performance can be > improved.Regards, Abdul Siddiqui, PEabdul.siddiqui@ertcorp.comERT > | Earth Resources Technology, Inc.14401 Sweitzer Ln. Ste 300Laurel, MD 20707 > https://www.ertcorp.com _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > -- > http://www.spatialys.com > My software is free, but my time generally not.
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev