Hi, Since it's not exactly clear from your description, what operations are you running, just the equivalent of gdal.Translate()? gdal.Warp()? GDAL can use threading in a couple of places: • to compress the output before writing it, e.g. the NUM_THREADS creation option of GTiff • to decompress the input when reading a region larger than one block or strip, e.g. the NUM_THREADS open option of GTiff • for pipelining the I/O and warping in gdalwarp (-multi) • to parallelize warping itself in gdalwarp (-wo NUM_THREADS) And of course, there might be others I'm not aware of.
I'm not sure about the effects you see when setting the cache, but note that the default cache GDAL_CACHEMAX is "5% of the usable physical RAM, [...] consulted the first time the cache size is requested". To disable the cache you can use GDAL_CACHEMAX=0, which can reduce the memory usage and speed up the program in very specific cases (e.g. when processing one block at a time without reading parts of the input twice), but becomes a lot less useful when you do any kind of warping or resampling. Laurentiu On Tue, Apr 1, 2025, at 10:19, Varisht Ghedia via gdal-dev wrote: > Dear GDAL Developers, > > I am working on optimizing the processing times for MODIS datasets (LST_1Km > and QC Day tile) using `pymodis` with some modifications. Specifically, I > have added flags for: > > • Running on all available CPU cores (`ALL_CORES`) > > • Adjusting GDAL cache size (`GDAL_CACHEMAX`) > > However, I am observing unexpected performance variations. In some cases, > increasing the cache size degrades performance instead of improving it. Below > are my test results for two different datasets from the same tile. Tile used: > MOD11A1.A2025073.h10v10.061.2025074095514.hdf > > EPSG:32618, Resampled to 30m > > *QC_tile.tif* > > `ALL_CORES + 2G > real 0m24.199s > user 0m53.352s > sys 0m9.998s > > STANDARD RUN (No Cache, No Multi-Threading) > real 0m32.133s > user 0m30.581s > sys 0m2.299s > > ALL_CORES + 512M > real 0m13.830s > user 0m51.083s > sys 0m1.911s ` > With 512M cache, performance improves significantly, but with larger caches > (1G, 2G, 4G), execution time increases. > > *LST_Day_1km.tif* > > `ALL_CORES + 512M > real 0m42.863s > user 0m44.105s > sys 0m3.583s > > STANDARD RUN (No Cache, No Multi-Threading) > real 0m45.121s > user 0m26.477s > sys 0m3.712s > > ALL_CORES + 2G > real 0m37.548s > user 0m48.302s > sys 0m8.113s > > ALL_CORES + 4G > real 0m51.845s > user 0m48.213s > sys 0m7.988s ` > For this dataset, using a 2G cache improves performance, but increasing it to > 4G makes processing slower. > > *Questions:* > > 1. How does GDAL’s caching mechanism impact performance in these scenarios? > > 2. Why does increasing cache size sometimes degrade performance? > > 3. Is there a recommended way to tune cache settings for MODIS HDF > processing, considering that some layers (like QC) behave differently from > others (like LST_1Km)? > > Any insights into how GDAL handles multi-threading and caching internally > would be greatly appreciated. > > Thanks in advance for your help! > > Best regards, > > Varisht Ghedia > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev