[gdal-dev] /vsis3/ on NetCDF from Earthdata

2023-07-20 Thread b.coerver--- via gdal-dev
Hello, I'm trying to access data from NASA's Earthdata S3 buckets, but I get a `" does not exist in the file system, and is not recognized as a supported dataset name."` error after waiting a long time (± 50 minutes, the process is downloading some data the whole time) doing the following: from osgeo import gdal gdal_config_options = {    "AWS_ACCESS_KEY_ID": creds["accessKeyId"],    "AWS_SESSION_TOKEN":  creds["sessionToken"],    "AWS_SECRET_ACCESS_KEY": creds["secretAccessKey"],    "AWS_REGION": "us-west-2",} url = "">for k, v in gdal_config_options.items():    gdal.SetConfigOption(k, v) out = gdal.Info(url) The `creds` variable is a dictionary with temporary credential information that I get from [here](https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials), you need a free account to get them. When I introduce an error in one of the keys/tokens (e.g. `"AWS_ACCESS_KEY_ID": creds["accessKeyId"] + "x"`, I do get a message immediately saying my credentials are unknown. So I do think they are being ingested correctly. I’m using GDAL version 3.7.1. I also managed to download the entire file using `boto3`, by doing the following: import boto3 client = boto3.client(    's3',    aws_access_key_id=creds["accessKeyId"],    aws_secret_access_key=creds["secretAccessKey"],    aws_session_token=creds["sessionToken"]    )client.download_file('prod-lads', 'VNP02IMG/VNP02IMG.A2023193.1942.002.2023194025636.nc', 'test.nc')Any ideas what I'm doing wrong or how to make this work? In the end I'm interested in accessing the files metadata without downloading the entire file. Regards,Bert 
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata

2023-07-20 Thread b.coerver--- via gdal-dev
That does it, thank you so much! From: Even Rouault Date: Thursday, 20 July 2023 at 11:44To: bcoer...@mailbox.org , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataBert,Also set the GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR config option, otherwise the generic open mechanism of GDAL tries to list the content of the VNP02IMG/ directory and it seems there are tons of files thereWhen doing that, I get a result within a few secondsEvenLe 20/07/2023 à 09:59, b.coerver--- via gdal-dev a écrit :Hello, I'm trying to access data from NASA's Earthdata S3 buckets, but I get a `" does not exist in the file system, and is not recognized as a supported dataset name."` error after waiting a long time (± 50 minutes, the process is downloading some data the whole time) doing the following: from osgeo import gdal gdal_config_options = {    "AWS_ACCESS_KEY_ID": creds["accessKeyId"],    "AWS_SESSION_TOKEN":  creds["sessionToken"],    "AWS_SECRET_ACCESS_KEY": creds["secretAccessKey"],    "AWS_REGION": "us-west-2",} url = "">for k, v in gdal_config_options.items():    gdal.SetConfigOption(k, v) out = gdal.Info(url) The `creds` variable is a dictionary with temporary credential information that I get from [here](https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials), you need a free account to get them. When I introduce an error in one of the keys/tokens (e.g. `"AWS_ACCESS_KEY_ID": creds["accessKeyId"] + "x"`, I do get a message immediately saying my credentials are unknown. So I do think they are being ingested correctly. I’m using GDAL version 3.7.1. I also managed to download the entire file using `boto3`, by doing the following: import boto3 client = boto3.client(    's3',    aws_access_key_id=creds["accessKeyId"],    aws_secret_access_key=creds["secretAccessKey"],    aws_session_token=creds["sessionToken"]    )client.download_file('prod-lads', 'VNP02IMG/VNP02IMG.A2023193.1942.002.2023194025636.nc', 'test.nc')Any ideas what I'm doing wrong or how to make this work? In the end I'm interested in accessing the files metadata without downloading the entire file. Regards,Bert ___gdal-dev mailing listgdal-dev@lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev-- http://www.spatialys.comMy software is free, but my time generally not.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata

2023-07-21 Thread b.coerver--- via gdal-dev
One more follow-up question: The datasets that I’m interested in contains subdatasets. I can get the info of a subdataset like this: sub_ds_path =  'HDF5:"/vsis3/prod-lads/VNP02IMG/VNP02IMG.A2021064.2342.002.2021128145323.nc"://observation_data/I04'    info = gdal.Info(sub_ds_path) This works fine and finishes in a few seconds. However, when I do the same thing for a different dataset (which contains the geolocation of the dataset above): sub_ds_path = 'HDF5:"/vsis3/prod-lads/VNP03IMG/VNP03IMG.A2021065.2324.002.2021127011303.nc"://geolocation_data/latitude'    info = gdal.Info(sub_ds_path) This takes about 2.5 minutes and I can see on my network that Python is downloading data at about 1MB/s the whole time. The info from this subdataset contains a lot of ground-control-points, so I tried setting “showGCPs=False”, but that doesn’t solve it. I’m not sure if it’s really the GCPs that’s causing this (when I save the info as a json, it is about 750kb in size). Any ideas what else can cause this difference in execution time? Regards,Bert   From: gdal-dev  on behalf of b.coerver--- via gdal-dev Date: Thursday, 20 July 2023 at 11:51To: Even Rouault , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataThat does it, thank you so much! From: Even Rouault Date: Thursday, 20 July 2023 at 11:44To: bcoer...@mailbox.org , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataBert,Also set the GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR config option, otherwise the generic open mechanism of GDAL tries to list the content of the VNP02IMG/ directory and it seems there are tons of files thereWhen doing that, I get a result within a few secondsEvenLe 20/07/2023 à 09:59, b.coerver--- via gdal-dev a écrit :Hello, I'm trying to access data from NASA's Earthdata S3 buckets, but I get a `" does not exist in the file system, and is not recognized as a supported dataset name."` error after waiting a long time (± 50 minutes, the process is downloading some data the whole time) doing the following: from osgeo import gdal gdal_config_options = {    "AWS_ACCESS_KEY_ID": creds["accessKeyId"],    "AWS_SESSION_TOKEN":  creds["sessionToken"],    "AWS_SECRET_ACCESS_KEY": creds["secretAccessKey"],    "AWS_REGION": "us-west-2",} url = "">for k, v in gdal_config_options.items():    gdal.SetConfigOption(k, v) out = gdal.Info(url) The `creds` variable is a dictionary with temporary credential information that I get from [here](https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials), you need a free account to get them. When I introduce an error in one of the keys/tokens (e.g. `"AWS_ACCESS_KEY_ID": creds["accessKeyId"] + "x"`, I do get a message immediately saying my credentials are unknown. So I do think they are being ingested correctly. I’m using GDAL version 3.7.1. I also managed to download the entire file using `boto3`, by doing the following: import boto3 client = boto3.client(    's3',    aws_access_key_id=creds["accessKeyId"],    aws_secret_access_key=creds["secretAccessKey"],    aws_session_token=creds["sessionToken"]    )client.download_file('prod-lads', 'VNP02IMG/VNP02IMG.A2023193.1942.002.2023194025636.nc', 'test.nc')Any ideas what I'm doing wrong or how to make this work? In the end I'm interested in accessing the files metadata without downloading the entire file Regards,Bert  ___gdal-dev mailing listgdal-dev@lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev-- http://www.spatialys.comMy software is free, but my time generally not.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata

2023-08-30 Thread b.coerver--- via gdal-dev
Hi Even, Thanks for you response. Is there any way to disable this automatic sampling of the latitude and longitude subdatasets? The file I’m accessing is 182.2MB in size, when I download one of its subdatasets using gdal_translate I can see on my network that the process transfers roughly the same amount (176Mbytes received according to Mac’s activity monitor). I’m interested in accessing the latitude and longitude subdatasets, so opening these two subdatasets would already result in about 350MB transferred… Also, when I run gdal_translate to download the file using the -sds flag, the data transferred greatly exceeds the total file size, it seems like the lat/lon subdatasets get re-downloaded for each of the subdatasets in the file. Regards,Bert From: Even Rouault Date: Friday, 21 July 2023 at 15:43To: b.coer...@mailbox.org , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata Le 21/07/2023 à 11:46, b.coer...@mailbox.org a écrit :One more follow-up question: The datasets that I’m interested in contains subdatasets. I can get the info of a subdataset like this: sub_ds_path =  'HDF5:"/vsis3/prod-lads/VNP02IMG/VNP02IMG.A2021064.2342.002.2021128145323.nc"://observation_data/I04'    info = gdal.Info(sub_ds_path) This works fine and finishes in a few seconds. However, when I do the same thing for a different dataset (which contains the geolocation of the dataset above): sub_ds_path = 'HDF5:"/vsis3/prod-lads/VNP03IMG/VNP03IMG.A2021065.2324.002.2021127011303.nc"://geolocation_data/latitude'    info = gdal.Info(sub_ds_path) This takes about 2.5 minutes and I can see on my network that Python is downloading data at about 1MB/s the whole time. The info from this subdataset contains a lot of ground-control-points, so I tried setting “showGCPs=False”, but that doesn’t solve it. I’m not sure if it’s really the GCPs that’s causing this (when I save the info as a json, it is about 750kb in size).The second product has georeferencing information, and upon opening of one of its subsdataset, GDAL samples the latitude and longitude arrays to expose ground control points, hence it reads those arrays. Any ideas what else can cause this difference in execution time? Regards,Bert  From: gdal-dev  on behalf of b.coerver--- via gdal-dev Date: Thursday, 20 July 2023 at 11:51To: Even Rouault , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataThat does it, thank you so much! From: Even Rouault Date: Thursday, 20 July 2023 at 11:44To: bcoer...@mailbox.org , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataBert,Also set the GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR config option, otherwise the generic open mechanism of GDAL tries to list the content of the VNP02IMG/ directory and it seems there are tons of files thereWhen doing that, I get a result within a few secondsEvenLe 20/07/2023 à 09:59, b.coerver--- via gdal-dev a écrit :Hello, I'm trying to access data from NASA's Earthdata S3 buckets, but I get a `" does not exist in the file system, and is not recognized as a supported dataset name."` error after waiting a long time (± 50 minutes, the process is downloading some data the whole time) doing the following: from osgeo import gdal gdal_config_options = {    "AWS_ACCESS_KEY_ID": creds["accessKeyId"],    "AWS_SESSION_TOKEN":  creds["sessionToken"],    "AWS_SECRET_ACCESS_KEY": creds["secretAccessKey"],    "AWS_REGION": "us-west-2",} url = "">for k, v in gdal_config_options.items():    gdal.SetConfigOption(k, v) out = gdal.Info(url) The `creds` variable is a dictionary with temporary credential information that I get from [here](https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials), you need a free account to get them. When I introduce an error in one of the keys/tokens (e.g. `"AWS_ACCESS_KEY_ID": creds["accessKeyId"] + "x"`, I do get a message immediately saying my credentials are unknown. So I do think they are being ingested correctly. I’m using GDAL version 3.7.1. I also managed to download the entire file using `boto3`, by doing the following: import boto3 client = boto3.client(    's3',    aws_access_key_id=creds["accessKeyId"],    aws_secret_access_key=creds["secretAccessKey"],    aws_session_token=creds["sessionToken"]    )client.download_file('prod-lads', 'VNP02IMG/VNP02IMG.A2023193.1942.002.2023194025636.nc', 'test.nc')Any ideas what I'm doing wrong or how to make this work? In the end I'm interested in accessing the files metadata without downloading the entire file Regards,Bert  ___gdal-dev mailing listgdal-dev@lists.osgeo.orghttps://lists.osgeo.org/mailman/lis

Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata

2023-08-30 Thread b.coerver--- via gdal-dev
I found a solution in the meantime, so no need to reply anymore: Using gdalmdimtranslate with the -array flag to select the group and array as e.g. “/geolocation_data/longitude” downloads the array and nothing else (bytes transferred around 80MB). Regards,Bert From: gdal-dev  on behalf of bcoerver--- via gdal-dev Date: Wednesday, 30 August 2023 at 10:47To: Even Rouault , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataHi Even, Thanks for you response. Is there any way to disable this automatic sampling of the latitude and longitude subdatasets? The file I’m accessing is 182.2MB in size, when I download one of its subdatasets using gdal_translate I can see on my network that the process transfers roughly the same amount (176Mbytes received according to Mac’s activity monitor) I’m interested in accessing the latitude and longitude subdatasets, so opening these two subdatasets would already result in about 350MB transferred… Also, when I run gdal_translate to download the file using the -sds flag, the data transferred greatly exceeds the total file size, it seems like the lat/lon subdatasets get re-downloaded for each of the subdatasets in the file. Regards,Bert From: Even Rouault Date: Friday, 21 July 2023 at 15:43To: b.coer...@mailbox.org , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from Earthdata Le 21/07/2023 à 11:46, b.coer...@mailbox.org a écrit :One more follow-up question: The datasets that I’m interested in contains subdatasets. I can get the info of a subdataset like this: sub_ds_path =  'HDF5:"/vsis3/prod-lads/VNP02IMG/VNP02IMG.A2021064.2342.002.2021128145323.nc"://observation_data/I04'    info = gdal.Info(sub_ds_path) This works fine and finishes in a few seconds. However, when I do the same thing for a different dataset (which contains the geolocation of the dataset above): sub_ds_path = 'HDF5:"/vsis3/prod-lads/VNP03IMG/VNP03IMG.A2021065.2324.002.2021127011303.nc"://geolocation_data/latitude'    info = gdal.Info(sub_ds_path) This takes about 2.5 minutes and I can see on my network that Python is downloading data at about 1MB/s the whole time. The info from this subdataset contains a lot of ground-control-points, so I tried setting “showGCPs=False”, but that doesn’t solve it. I’m not sure if it’s really the GCPs that’s causing this (when I save the info as a json, it is about 750kb in size).The second product has georeferencing information, and upon opening of one of its subsdataset, GDAL samples the latitude and longitude arrays to expose ground control points, hence it reads those arrays. Any ideas what else can cause this difference in execution time? Regards,Bert  From: gdal-dev  on behalf of b.coerver--- via gdal-dev Date: Thursday, 20 July 2023 at 11:51To: Even Rouault , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataThat does it, thank you so much! From: Even Rouault Date: Thursday, 20 July 2023 at 11:44To: bcoer...@mailbox.org , gdal-dev@lists.osgeo.org Subject: Re: [gdal-dev] /vsis3/ on NetCDF from EarthdataBert,Also set the GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR config option, otherwise the generic open mechanism of GDAL tries to list the content of the VNP02IMG/ directory and it seems there are tons of files thereWhen doing that, I get a result within a few secondsEvenLe 20/07/2023 à 09:59, b.coerver--- via gdal-dev a écrit :Hello, I'm trying to access data from NASA's Earthdata S3 buckets, but I get a `" does not exist in the file system, and is not recognized as a supported dataset name."` error after waiting a long time (± 50 minutes, the process is downloading some data the whole time) doing the following: from osgeo import gdal gdal_config_options = {    "AWS_ACCESS_KEY_ID": creds["accessKeyId"],    "AWS_SESSION_TOKEN":  creds["sessionToken"],    "AWS_SECRET_ACCESS_KEY": creds["secretAccessKey"],    "AWS_REGION": "us-west-2",} url = "">for k, v in gdal_config_options.items():    gdal.SetConfigOption(k, v) out = gdal.Info(url) The `creds` variable is a dictionary with temporary credential information that I get from [here](https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials), you need a free account to get them. When I introduce an error in one of the keys/tokens (e.g. `"AWS_ACCESS_KEY_ID": creds["accessKeyId"] + "x"`, I do get a message immediately saying my credentials are unknown. So I do think they are being ingested correctly. I’m using GDAL version 3.7.1. I also managed to download the entire file using `boto3`, by doing the following: import boto3 client = boto3.client(    's3',    aws_access_key_id=creds["accessKeyId"],    aws_secret_access_key=creds["secretAccessKey"],    aws_session_token=creds["sessionToken"