jesspav opened a new issue, #247:
URL: https://github.com/apache/sedona-db/issues/247

   Hi folks,
   
   Pulled together requirements and a design for a memory model for an 
efficient, flexible in-memory model for raster data that aligns with SedonaDB's 
architecture.  Also built a quick prototype with the designed schema, as well 
as some accessors and some functions that use them.  
   
   Looking forward to your feedback.
   
   
   ## Data to Store
   
   ### Raster Metadata
   Standard raster metadata fields in Sedona, Havasu, and similar systems:
   - **Width**: Pixel count along X axis
   - **Height**: Pixel count along Y axis
   - **UpperleftX** / **UpperleftY**: Upper-left corner coordinates (CRS units)
   - **ScaleX** / **ScaleY**: Cell scaling factors
   - **SkewX** / **SkewY**: Cell skew parameters
   
   ### Bounding Box
   - Optional WGS84 bounding box essential for speeding up spatial queries
   
   ### CRS
   - Full CRS info (including SRID)
   
   ### Raster Bands
   Bands store metadata and, for in-db, the data itself.  We may want to expand 
this to include statistical data about the bands as well.
   
   - **NoDataValue**: The no data value
   - **Storage Types**:
     - OutDB Ref: External reference
     - InDB: In-memory values on the band
     - InDb Reference: Memory pool storage (outside Arrow array) - not in the 
initial release
   - **Data Types**: Standard data types including UInt8, Int32, Float32, etc. 
Initial version will not include complex types, but we would like to include 
these in the future
   - **Out Db Metadata:**  URL + band id
   - **Compression Type:** The compression type of the band
   - **Data** 
   
   **Structure:**
   ```
   Raster
   ├─ Metadata
   ├─ BBox
   ├─ CRS
   └─ Bands
       └─ Band
           ├─ Metadata
           ├─ Statistics (optional)
           └─ Data
   ```
   
   ## Apache Arrow Arrays
   SedonaDB leverages Apache Arrow for speed and efficiency:
   - Immutable: Metadata updates require array copies
   - Columnar: Fast metadata queries
   - Typed: Runtime validation
   - Zero-Copy: Language interoperability
   - Null Handling: Efficient bitmaps
   - Batch/Vectorized: SIMD-ready operations
   
   **StructArrays:**
   - Struct arrays separate fields as independent child arrays, enabling 
flexible queries and future-proofing.
   
   
   See the schema prototype using StructArrays: 
[sedona-schema/src/datatypes.rs#L368-L481](https://github.com/jesspav/sedona-db/blob/prototype_raster/rust/sedona-schema/src/datatypes.rs#L368-L481)
   
   ## Design Considerations
   
   ### Access Patterns
   - **Loading/Writing:** `RS_FromGeoTiff`, `RS_AsGeoTiff` operate on full 
raster objects
   - **Aggregators:** Functions like `RS_Union` merge ArrowArrays, creating new 
ones
   - **Predicates:** Metadata (esp. bounding box) can be stored in dedicated 
columns for fast queries
   - **Array-Based Operators:** Optional per-band statistics (min, max, mean) 
enable shortcut computations
   
   ### Tiling
   - Large rasters are split into smaller tiles for performance and scalability
   - New tiles have the smaller rasters have the tile width/height and upper 
left corner adjusted to the appropriate point and the smaller subset of the 
data on the band
   
   ### GDAL Integration
   - Arrow buffers can be mapped directly to slices for GDAL if types match and 
no nulls
   
   ### Vectorized Processing
   - SIMD ops on band data; metadata in columnar layout enables rapid predicate 
evaluation
   - RecordBatch sizing controls row/column orientation
   
   ### Compression
   - Expect to expand the design later for per-band compression; since columnar 
data compresses well
   
   ## Prototype
   https://github.com/jesspav/sedona-db/pull/2/files
   
   ## References
   - [SedonaType enum and Arrow 
integration](https://github.com/apache/sedona-db/blob/main/rust/sedona-schema/src/datatypes.rs)
   - [GeoArrow C 
integration](https://github.com/apache/sedona-db/blob/main/c/sedona-geoarrow-c/src/geoarrow_c.rs)
   - [Arrow array schema 
handling](https://github.com/apache/sedona-db/blob/main/python/sedonadb/src/udf.rs)
   - [Sedona Raster 
Functions](https://sedona.apache.org/1.6.1/api/sql/Raster-operators/)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to