on NetCDF from Earthdata

b.coerver--- via gdal-dev Wed, 30 Aug 2023 01:47:06 -0700

Hi Even,

Thanks for you response. Is there any way to disable this automatic sampling of the latitude and longitude subdatasets?

The file I’m accessing is 182.2MB in size, when I download one of its subdatasets using gdal_translate I can see on my network that the process transfers roughly the same amount (176Mbytes received according to Mac’s activity monitor). I’m interested in accessing the latitude and longitude subdatasets, so opening these two subdatasets would already result in about 350MB transferred…

Also, when I run gdal_translate to download the file using the -sds flag, the data transferred greatly exceeds the total file size, it seems like the lat/lon subdatasets get re-downloaded for each of the subdatasets in the file.

Regards,

Bert

From: Even Rouault <even.roua...@spatialys.com>
Date: Friday, 21 July 2023 at 15:43
To: b.coer...@mailbox.org <b.coer...@mailbox.org>, gdal-dev@lists.osgeo.org <gdal-dev@lists.osgeo.org>
Subject: Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata

Le 21/07/2023 à 11:46, b.coer...@mailbox.org a écrit :

One more follow-up question:

The datasets that I’m interested in contains subdatasets. I can get the info of a subdataset like this:

    sub_ds_path = 'HDF5:"/vsis3/prod-lads/VNP02IMG/VNP02IMG.A2021064.2342.002.2021128145323.nc"://observation_data/I04'
    info = gdal.Info(sub_ds_path)

This works fine and finishes in a few seconds. However, when I do the same thing for a different dataset (which contains the geolocation of the dataset above):

    sub_ds_path = 'HDF5:"/vsis3/prod-lads/VNP03IMG/VNP03IMG.A2021065.2324.002.2021127011303.nc"://geolocation_data/latitude'
    info = gdal.Info(sub_ds_path)

This takes about 2.5 minutes and I can see on my network that Python is downloading data at about 1MB/s the whole time. The info from this subdataset contains a lot of ground-control-points, so I tried setting “showGCPs=False”, but that doesn’t solve it. I’m not sure if it’s really the GCPs that’s causing this (when I save the info as a json, it is about 750kb in size).

The second product has georeferencing information, and upon opening of one of its subsdataset, GDAL samples the latitude and longitude arrays to expose ground control points, hence it reads those arrays.

Any ideas what else can cause this difference in execution time?

Regards,
Bert
From: gdal-dev <gdal-dev-boun...@lists.osgeo.org> on behalf of b.coerver--- via gdal-dev <gdal-dev@lists.osgeo.org>
Date: Thursday, 20 July 2023 at 11:51
To: Even Rouault <even.roua...@spatialys.com>, gdal-dev@lists.osgeo.org <gdal-dev@lists.osgeo.org>
Subject: Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata
That does it, thank you so much!
From: Even Rouault <even.roua...@spatialys.com>
Date: Thursday, 20 July 2023 at 11:44
To: bcoer...@mailbox.org <b.coer...@mailbox.org>, gdal-dev@lists.osgeo.org <gdal-dev@lists.osgeo.org>
Subject: Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata
Bert,
Also set the GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR config option, otherwise the generic open mechanism of GDAL tries to list the content of the VNP02IMG/ directory and it seems there are tons of files there
When doing that, I get a result within a few seconds
Even
Le 20/07/2023 à 09:59, b.coerver--- via gdal-dev a écrit :
Hello,

I'm trying to access data from NASA's Earthdata S3 buckets, but I get a `"<filename> does not exist in the file system, and is not recognized as a supported dataset name."` error after waiting a long time (± 50 minutes, the process is downloading some data the whole time) doing the following:

from osgeo import gdal

gdal_config_options = {
    "AWS_ACCESS_KEY_ID": creds["accessKeyId"],
    "AWS_SESSION_TOKEN": creds["sessionToken"],
    "AWS_SECRET_ACCESS_KEY": creds["secretAccessKey"],
    "AWS_REGION": "us-west-2",
}

url = "">

for k, v in gdal_config_options.items():
    gdal.SetConfigOption(k, v)

out = gdal.Info(url)

The `creds` variable is a dictionary with temporary credential information that I get from [here](https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials), you need a free account to get them.

When I introduce an error in one of the keys/tokens (e.g. `"AWS_ACCESS_KEY_ID": creds["accessKeyId"] + "x"`, I do get a message immediately saying my credentials are unknown. So I do think they are being ingested correctly. I’m using GDAL version 3.7.1.

I also managed to download the entire file using `boto3`, by doing the following:

    import boto3

    client = boto3.client(
        's3',
        aws_access_key_id=creds["accessKeyId"],
        aws_secret_access_key=creds["secretAccessKey"],
        aws_session_token=creds["sessionToken"]
    )

    client.download_file('prod-lads', 'VNP02IMG/VNP02IMG.A2023193.1942.002.2023194025636.nc', 'test.nc')

Any ideas what I'm doing wrong or how to make this work? In the end I'm interested in accessing the files metadata without downloading the entire file

Regards,
Bert
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
-- 
http://www.spatialys.com
My software is free, but my time generally not.

--

http://www.spatialys.com

My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata

Reply via email to