On Mon, 16 Jul 2018, Paul Meems wrote:

Thanks, Jon for your suggestion of GeoPandas.
Unfortunately, I'm not allowed to use new external dependencies.

Some timing:
1,677 shapes --> 0.3s
4,810 shapes --> 1.8s
18,415 shapes --> 21.4s
72,288 shapes --> 5min, 54s
285,927 shapes --> 25m
1,139,424 shapes --> 6h, 47m
4,557,696 shapes --> Still running for 34h

4 million shapes are the amount my application needs to handle, but running
for days is not an option.

I noticed my script is using only a fraction of my resources: 30% RAM (of
12GB), 22-28% CPU (on 8 cores).
How can I let GDAL use more resources? Might it speed up the process?

If you aren't using most of your CPU or memory, I'd guess that reading from or writing to disk is the bottleneck. I'm not sure whether ogr uses
GDAL_CACHEMAX, but you could try
    export GDAL_CACHEMAX=12288
to make gdal use 12GB of cache (default is 40MB or 5% of RAM).
If the bottleneck is in sqlite you might be able to do something equivalent
there. If the bottleneck is writing the file, perhaps a ram disk might make sense ?

--
Andrew C. Aitchison                                     Cambridge, UK
                        [email protected]
_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to