Le 28/10/2024 à 17:01, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND
APPLICATIONS INC] via gdal-dev a écrit :
I have two calls to gdal.Rasterize, each of which target a separate
GDAL memory dataset but source the same OGR memory dataset, that I
hoped could be ran in parallel using Python’s concurrent futures. The
idea being that each GDAL call unlocks the Python GIL, and performing
read only operations on the vector database (except for storing memory
for the results) could in principle be a safe and effective
optimization, as the feature layers themselves are not mutated. The
SQL dialect is SQLite, so presumably the OGR dataset has to be
converted to a SQLite (memory) database. Technically SQLite supports
multiple readers just fine, but this doesn’t mean GDAL/OGR does. The
multithreading documentation page doesn’t explicitly mention OGR /
vector datasets but I presume they inherit similar stateful
restrictions (Yes RFC 101 is coming). However, running these SQL
queries at the same times causes OGR to trip over itself (I presume
OGR assumes only one query statement is being evaluated at the same time).
So I think the intended work around is either: accept this is as a
serially dependent task, or copy the dataset and have each Rasterize()
work on a copy, yes?
I'm not clear if you use the same Python source vector dataset, or if
you open your source dataset once for each thread ? The first case is a
big no no: anything could happen, including wrong results and crashes.
One object per thread is the way to go. If the processing is very
intensive on acquiring source features, you may hit a global lock at the
SQLite level, but there isn't much we can do about that. Or you need to
use multi-processing parallelization instead of multi-threading. But you
certainly don't need to copy your source dataset.
In the same spirit as RFC 101, which gives some thread safety to
raster read-only workloads, is there interest in expanding this to
vector datasets?
That would be tricky. What would be the expect result if a user would
use GetNextFeature() on a thread-safe OGRLayer...: would users expect
each thread to see all features or features would be distributed among
calling threads ?
Even
--
http://www.spatialys.com
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is
just about bytes.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev