isaaccorley opened a new pull request, #479:
URL: https://github.com/apache/sedona-db/pull/479
Adds support for reading GeoParquet files from Azure Blob Storage.
## Changes
- New `azure` feature flag in `rust/sedona/Cargo.toml` enabling
`object_store/azure`
- `AzureOptions` struct supporting common auth methods:
- `account_name`, `sas_token`, `access_key`
- `bearer_token`, `client_id`, `client_secret`, `tenant_id`, `authority_id`
- URL scheme support for `az://`, `abfs://`, `abfss://`
- Fixed GeoParquet metadata parsing for files missing `geometry_types` field
(e.g., MS Building Footprints)
- Fixed URL extension detection to strip query params before checking file
type
## Motivation
Wanted to query Microsoft Planetary Computer datasets (MS Building
Footprints, etc.) directly from SedonaDB. These are hosted on Azure Blob
Storage and use SAS token auth.
## Usage
```python
import sedonadb
sd = sedonadb.connect()
df = sd.read_parquet(
"abfss://[email protected]/path/",
options={
'azure.account_name': 'blobstorage',
'azure.sas_token': 'sv=2023-01-03&st=...'
}
)
```
## Testing
- Ran `pre-commit run --all-files`
- `cargo clippy --workspace --all-targets --all-features -- -Dwarnings`
- `cargo test -p sedona -p sedona-geoparquet --all-features` (86 tests pass)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]