ok, that now makes sense. Writing a .fgb files comes into those exceptions where RAM consumption might be important, as it involves building a packed Hilbert R-Tree in memory. With the current implementation, you need at least the number of features times some constant amount of RAM, at least to store the list of each feature bounding box + their offset in a temporary file. From what I can see this constant is at least 40 bytes. So in your particular case this requires at least 145459485 * 40 = 5.5 GB of RAM. And probably (not totally sure) twice that to store this initial list and the tree itself. I guess the implementation could be made smarter and use on-disk temporary memory, but that would likely involve serious implementation complications. I let Björn comment more on this if he follows this discussion.

I've submitted a doc enhancement to mention this requirement: https://github.com/OSGeo/gdal/pull/8490

Le 28/09/2023 à 19:17, Scott a écrit :
USA.fgb is 36 GB. I've renamed it from its original source which can be found here:
https://beta.source.coop/vida/google-microsoft-open-buildings

ogr2ogr -sql "select area_in_meters from bfp_USA" -nln footprints footprints.fgb ~/Downloads/USA.fgb

GDAL 3.7.1
OS Debian Buster

Output from ogrinfo -ro -al USA.fgb

Layer name: bfp_USA
Geometry: Unknown (any)
Feature Count: 145459485
Extent: (-160.221701, 17.677691) - (-64.583428, 71.360579)
Layer SRS WKT:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["unknown"],
        AREA["World"],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
boundary_id: Integer64 (0.0)
bf_source: String (0.0)
confidence: Real (0.0)
area_in_meters: Real (0.0)
OGRFeature(bfp_USA):0
  boundary_id (Integer64) = 116
  bf_source (String) = google
  confidence (Real) = 0.906
  area_in_meters (Real) = 187.4652
  POLYGON ((-64.6399621676723 17.7225504518464,-64.6400377660957 17.722583049763,-64.6400238635835 17.7226126625647,-64.6400901719124 17.7226412545727,-64.640104074415  17.722611641767,-64.6401239848718 17.7226202271066,-64.6401528522526 17.7225587385527,-64.6400955687758 17.7225340380511,-64.6401051288881 17.7225136746756,-64.640040 1136221 17.7224856402151,-64.640030553504 17.7225060035881,-64.6399910351014 17.7224889633119,-64.6399621676723 17.7225504518464))

OGRFeature(bfp_USA):1
  boundary_id (Integer64) = 116
  bf_source (String) = microsoft
  area_in_meters (Real) = 51.0777955237376
  POLYGON ((-64.6398677811851 17.7219759840792,-64.6397939789141 17.7219853127982,-64.6398020235506 17.7220430591893,-64.6398758258215 17.7220337304732,-64.63986778118
51 17.7219759840792))

OGRFeature(bfp_USA):2
  boundary_id (Integer64) = 116
  bf_source (String) = google
  confidence (Real) = 0.8323
  area_in_meters (Real) = 178.5448
  POLYGON ((-64.6397672401299 17.7220665249078,-64.6397654280552 17.722041016034,-64.6395789582891 17.7220531822569,-64.6395832735872 17.7221139302758,-64.639696737462 3 17.7221065273415,-64.639698399651 17.7221299263498,-64.6398064310524 17.7221228777942,-64.6398022655579 17.7220642396531,-64.6397672401299 17.7220665249078))


On 9/28/23 10:03, Even Rouault wrote:

Le 28/09/2023 à 18:47, Scott via gdal-dev a écrit :

I should have been more specific.

One particular machine has 8GB of memory. When I try to do the most simple ogr2ogr command on large files, the host runs out of memory (vmstat shows this) and ogr2ogr terminates with 'Killed', nothing more.

The data formats I have experienced this with are .fgb, .parquet and .gpkg. The data files are 10's of GB.

As input ? as output? Which operating system ? Which GDAL version ? The output of "ogrinfo -al -so the_input" might also be helpful. An exact ogr2ogr command line invocation that triggers the issue would certainly be useful.  In general, most GDAL drivers and ogr2ogr itself operate in streaming mode with low RAM requirements, but there might be exceptions (some configurations of GeoJSON file may require full ingestion on reading for example).  I'm also aware of issues with RAM fragmentation due to how some memory allocators work, but they seem to be restricted to multithreaded uses (https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading), which current ogr2ogr shouldn't trigger

Even


Thanks for the responses!
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to