Dear GDAL Developers, I am working on optimizing the processing times for MODIS datasets (LST_1Km and QC Day tile) using pymodis with some modifications. Specifically, I have added flags for:
- Running on all available CPU cores (ALL_CORES) - Adjusting GDAL cache size (GDAL_CACHEMAX) However, I am observing unexpected performance variations. In some cases, increasing the cache size degrades performance instead of improving it. Below are my test results for two different datasets from the same tile. Tile used: MOD11A1.A2025073.h10v10.061.2025074095514.hdf EPSG:32618, Resampled to 30m *QC_tile.tif* ALL_CORES + 2G real 0m24.199s user 0m53.352s sys 0m9.998s STANDARD RUN (No Cache, No Multi-Threading) real 0m32.133s user 0m30.581s sys 0m2.299s ALL_CORES + 512M real 0m13.830s user 0m51.083s sys 0m1.911s With 512M cache, performance improves significantly, but with larger caches (1G, 2G, 4G), execution time increases. *LST_Day_1km.tif* ALL_CORES + 512M real 0m42.863s user 0m44.105s sys 0m3.583s STANDARD RUN (No Cache, No Multi-Threading) real 0m45.121s user 0m26.477s sys 0m3.712s ALL_CORES + 2G real 0m37.548s user 0m48.302s sys 0m8.113s ALL_CORES + 4G real 0m51.845s user 0m48.213s sys 0m7.988s For this dataset, using a 2G cache improves performance, but increasing it to 4G makes processing slower. *Questions:* 1. How does GDAL’s caching mechanism impact performance in these scenarios? 2. Why does increasing cache size sometimes degrade performance? 3. Is there a recommended way to tune cache settings for MODIS HDF processing, considering that some layers (like QC) behave differently from others (like LST_1Km)? Any insights into how GDAL handles multi-threading and caching internally would be greatly appreciated. Thanks in advance for your help! Best regards, Varisht Ghedia
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev