Frank,

I'm actually surprised that it is considered impractical to make some/most drivers thread safe (read-only) natively and I thought key ones already were mostly there in the past though I'm not so confident in my memory.
Things have got more sophisticated over time with more deferred loading strategies, but I don't think any driver has been close to be thread-safe even in read-only scenarios. Except maybe the RasterIO part of the RawDataset, which has pretty much no mutable state (if you exclude file operations, and anything involving the block cache)
 Looking through the GTiff code I find myself not able to work out where locking is done and how we are managing the directory loading/switching as block reads are done.
No locking is done. To be noted that the logic of TIFF directory loading/switching changed "a few" years ago. In the previous situation, the main dataset and its overview datasets were sharing the same TIFF* handle. But that strategy of switching from TIFF directory when switching between overview datasets had also a major performance drawback of invalidating TIFF bytecount[] and offset[] arrays, which was particularly annoying when computing overviews, where that back&forth is done consistently. And that inconvenience increased with the size of datasets and those arrays. But even if there are now several TIFF* handles, they do share the same VSILFILE* underneath, and there's a layer between the driver and libtiff to ensure that the file pointer position is properly set when activating one TIFF* handle at the position it expected, that flushes are done propertly so that one TIFF* handle does see a consistent state on disk (for read/write scenarios), etc.

I would certainly like to work towards read-only thread safety for some drivers with:
 - a shared block cache for all threads reading from a dataset
That easiest way of doing that would be to put a per-dataset mutex in classes GDALHashSetBandBlockCache and GDALArrayBandBlockCache, but that would hurt scalability even when several threads try to access different blocks at the same time (unless smart things are done to drop the mutex as soon as possible)
 - no duplication of file handles per-thread

I just don't see how to do that. Even a mutex around file operations would not be sufficient since we'd want both Seek()+Read() to be atomic per thread. PRead() would solve that, but very few drivers or underlying libraries use PRead() (and there's no implementation of it for all file systems, like Windows file API which lacks it). Or one would need a VSIThreadSafeHandle that would maintain a per-thread position, but that would still involves locking around access to the underneath handle.

libtiff itself has a lot of state variables, in its core or in its codecs (e.g. the deflate codec has to maintain a zlib handle as a state variable if doing partial reads of a tile/strip).  Even getting the location of a TIFF tile is not thread-safe when defered byteount[]/offset[] array loading is enabled (which is necessary when dealing with extremely large remote COGs) since it involves file accesses.

You could raise the point that the GTiff driver does have multi-threaded read optimizations, but it takes a lot of precautions to make that happening safely, and it has the chance of fully controlling the conditions where that happens. So basically it starts by fetching the location and size of the needed blocks, then each thread will read the needed blocks (either using PRead() if available, or taking a mutex around file accesses), create per-thread TIFF* handle, replicating the strategic TIFF tags from the main TIFF handle so that the codecs are initialized with the appropriate state and using a new TIFFReadFromUserBuffer() call that was added in libtiff to decode an in-memory encoded buffer (without doing file accesses). And even with that, I remember having fixed last year or so a case that I totally missed initially, where the block cache accidentally triggered in one of those worker thread. So yes, multi-threading is hard to get right.

 - a way of checking that a dataset is considered to be thread safe.
Not totally sure to understand what you mean here. The RFC introduces a IsThreadSafe() virtual method that a driver can potentially implement (only set to true currently by instances of GDALThreadSafeDataset). But finding if a driver is thread-safe seems to be a non-solvable/provable problem to me, apart from staring at the code for a sufficient long time to convince oneself it is actually + adding some torture tests and hope they catch most issues. But things can get very complicated if you need to test all combinations of virtual methods that can be called...

I'm curious if you can talk a bit more about the challenges of accomplishing the above.   I had been hoping to take advantage of my (assumption) that this already mostly existed in GDAL for our new high performance environment at Planet so it is of more than academic interest to me.

Probably that implementing a restricted read-only thread-safe TIFF driver, only addressing the common cases of TIFF without all the edge cases GDAL and libtiff try to address, for a controlled set of codecs, assuming pread() is always available, assuming you always read full tiles & strips, re-implementing a very simplified libtiff, and doing it "at hand" would be workable. Perhaps also excluding raster queries not at a resolution that exactly matches one of the overview level to completely by-pass the block cache. But that would be a separate driver, with a more restricted set of functionality than our current GDAL GeoTIFF driver that handles all the possible cases.

  Also, doesn't gdalwarp's multithreading support already depend on read-only thread safety for source files?

Definitely not. The initial -multi mode works with 2 threads: one for I/O, one for compute. The later was further extended to also multi-thread the compute part. But I/O is still done by a single thread, hence no requirement on the thread-safety of drivers.

Even

--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to