I confirm that the structure of the TAB dataset uses 512 bytes data blocks organized in a tree structure, so reading from the file implies lots of random access over the whole file even if you read the features sequentially since a single feature is stored in multiple data blocks of various types (feature header blocks, feature coordinate blocks, etc.).  It would be interesting to know if VSI_CACHE as suggested by Even will help.

Daniel

On 2022-07-27 11:55, Even Rouault wrote:

Moises,

I've not reviewed in depth the MITAB driver, but reading from a .tab file may require random access, and it is thus not surprising that reading from a compressed file may exhibit poor performance. You might try to set the VSI_CACHE config option / env variable to YES, but no guarantee this will help for your use case.

Even

Le 27/07/2022 à 11:39, Moises Calzado via gdal-dev a écrit :
Hi everyone!

We're using ogr2ogr to convert MapInfo TAB files into CSV format using the following command:

    ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
    /vsizip/onLDU.zip  -oo AUTODETECT_TYPE=YES -lco CREATE_CSVT=YES >
    test_2.csv


The file weights ≈200 MB and the process is taking too much time to finish (almost 20 min), so we don't know if we're doing something wrong regarding the command that we launch.
Screenshot 2022-07-20 at 12.55.14.png
However, if we launch the same command against the .tab file instead of using the vsizip virtual file system, it takes less than 30 seconds to complete.

Have you ever seen something like this? Do you know if it's expected that it takes too much time to process this kind of files, or we're doing something wrong?

Thanks so much for your help in advance,
Regards!
--
*Moises Calzado*

Support Engineer

(US) +1 917 463 3232 | (ES) +34 911 165 823 | [email protected]

<https://spatial-data-science-conference.com/2022/newyork/>

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev


--
Daniel Morissette
Mapgears Inc
T: +1 418-696-5056 #201
_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to