paleolimbot opened a new issue, #137:
URL: https://github.com/apache/sedona-db/issues/137

   Currently our non-Parquet IO depends on GeoPandas or DuckDB. These are great 
workarounds but they don't leverage the generic pushdown/pruning capability 
that DataFusion gives us.
   
   While we could hook directly into GDAL, building, linking, and packaging 
GDAL isn't something I'd like to do specifically for vector support if there 
are any alternatives. It's possible that specifically for OGR support we might 
be able to do something similar to our PROJ support (dynamically pull symbols), 
but that won't scale beyond a very limited set of operations.
   
   Since GDAL 3.6, an ArrowArrayStream interface has been provided for reading 
OGR layers. Our DataFrames support arbitrary ArrowArrayStream input, although 
we would need to modify it to be more resiliant to multiple collects (e.g., 
today if you try to create a data frame, `.show()` it twice, it will fail the 
second time because the array stream has already been pulled). Also, we need to 
wire in pushdown support which GDAL does very well (e.g., using embedded 
shapefile/fgb/gpkg spatial index).
   
   Proof of concept:
   
   ```python
   import pyogrio.raw
   import sedona.db
   
   sd = sedona.db.connect()
   
   url = 
"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb";
   with pyogrio.raw.ogr_open_arrow(f"/vsicurl/{url}", {}) as info:
       meta, reader = info
       print(meta)
       df = sd.create_data_frame(reader).to_memtable()
   
   df.show(5)
   #> {'crs': 'EPSG:4326', 'encoding': 'UTF-8', 'fields': array(['name'], 
dtype=object), 'geometry_type': 'Point', 'geometry_name': '', 'fid_column': 
'OGC_FID'}
   #> ┌──────────────┬───────────────────────────────┐
   #> │     name     ┆          wkb_geometry         │
   #> │     utf8     ┆            geometry           │
   #> ╞══════════════╪═══════════════════════════════╡
   #> │ Vatican City ┆ POINT(12.4533865 41.9032822)  │
   #> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
   #> │ San Marino   ┆ POINT(12.4417702 43.9360958)  │
   #> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
   #> │ Vaduz        ┆ POINT(9.5166695 47.1337238)   │
   #> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
   #> │ Lobamba      ┆ POINT(31.1999971 -26.4666675) │
   #> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
   #> │ Luxembourg   ┆ POINT(6.1300028 49.6116604)   │
   #> └──────────────┴───────────────────────────────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to