paleolimbot commented on code in PR #268: URL: https://github.com/apache/sedona-db/pull/268#discussion_r2487256626
########## rust/sedona-raster-functions/src/executor.rs: ########## @@ -0,0 +1,188 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use arrow_array::{Array, ArrayRef, StructArray}; +use datafusion_common::error::Result; +use datafusion_common::{DataFusionError, ScalarValue}; +use datafusion_expr::ColumnarValue; +use sedona_raster::array::{RasterRefImpl, RasterStructArray}; +use sedona_schema::datatypes::SedonaType; +use sedona_schema::datatypes::RASTER; + +/// Helper for writing raster kernel implementations +/// +/// The [RasterExecutor] provides a simplified interface for executing functions +/// on raster arrays, handling the common pattern of downcasting to StructArray, +/// creating raster iterators, and handling null values. +pub struct RasterExecutor<'a, 'b> { + pub arg_types: &'a [SedonaType], + pub args: &'b [ColumnarValue], + num_iterations: usize, +} + +impl<'a, 'b> RasterExecutor<'a, 'b> { Review Comment: I wonder if we can simplify this a bit because not all the reasons the "visit" thing exists for Geometry/Geography apply here. For Wkb the items could be very small and the per-iteration cost matters a lot...also we encode geometry in potentially a lot of different ways (even though binary view and regular binary are currently the only two we care about). For raster the per-iteration cost is is less of a factor and we have exactly one representation of it in our engine, so things like the double dynamic function pointer calls of `Box<dyn Iterator<Item = &dyn RasterRef>>` aren't an issue like they might be for Wkb iteration. I am not sure if the lifetimes will work out, but maybe something like: ```rust pub trait IntoRasterIterator { fn into_raster_iterator(&self, num_iterations: usize) -> Box<dyn Iterator<Item = &dyn RasterRef>>; } impl IntoRasterIterator for ScalarValue { .. } impl IntoRasterIterator for ArrayRef { .. } impl IntoRasterIterator for ColumnarValue { .. } ``` ...would be easier to mix and match with functions that iterate over geometry and raster together? ########## rust/sedona-raster-functions/src/register.rs: ########## @@ -0,0 +1,51 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +use sedona_expr::function_set::FunctionSet; + +/// Export the set of functions defined in this crate +pub fn default_function_set() -> FunctionSet { + let mut function_set = FunctionSet::new(); + + macro_rules! register_scalar_udfs { + ($function_set:expr, $($udf:expr),* $(,)?) => { + $( + $function_set.insert_scalar_udf($udf()); + )* + }; + } + + macro_rules! register_aggregate_udfs { + ($function_set:expr, $($udf:expr),* $(,)?) => { + $( + $function_set.insert_aggregate_udf($udf()); + )* + }; + } + + register_scalar_udfs!(function_set, crate::rs_size::rs_width_udf,); + + register_aggregate_udfs!(function_set,); + + function_set +} + +/// Functions whose implementations are registered independently +/// +/// These functions are included in the default function set; however, +/// it is useful to expose them individually for testing in crates that +/// implement them. +pub mod stubs {} Review Comment: ```suggestion ``` (We can probably punt on these until they're needed?) ########## rust/sedona-raster/src/array.rs: ########## @@ -621,4 +628,41 @@ mod tests { assert_eq!(band_values, vec![0, 1, 2]); } + + #[test] + fn test_raster_is_null() { + let mut builder = RasterBuilder::new(2); + let metadata = RasterMetadata { + width: 5, + height: 5, + upperleft_x: 0.0, + upperleft_y: 0.0, + scale_x: 1.0, + scale_y: -1.0, + skew_x: 0.0, + skew_y: 0.0, + }; Review Comment: Can we use `generate_test_rasters()` here? ########## rust/sedona-raster-functions/src/rs_size.rs: ########## @@ -0,0 +1,128 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +use std::{sync::Arc, vec}; + +use crate::executor::RasterExecutor; +use arrow_array::builder::UInt64Builder; +use arrow_schema::DataType; +use datafusion_common::error::Result; +use datafusion_expr::{ + scalar_doc_sections::DOC_SECTION_OTHER, ColumnarValue, Documentation, Volatility, +}; +use sedona_expr::scalar_udf::{SedonaScalarKernel, SedonaScalarUDF}; +use sedona_raster::traits::RasterRef; +use sedona_schema::{datatypes::SedonaType, matchers::ArgMatcher}; + +/// RS_Width() scalar UDF implementation +/// +/// Extract the width of the raster +pub fn rs_width_udf() -> SedonaScalarUDF { + SedonaScalarUDF::new( + "rs_width", + vec![Arc::new(RsWidth {})], + Volatility::Immutable, + Some(rs_width_doc()), + ) +} + +fn rs_width_doc() -> Documentation { + Documentation::builder( + DOC_SECTION_OTHER, + "Return the width component of a raster".to_string(), + "RS_Width(raster: Raster)".to_string(), + ) + .with_argument("raster", "Raster: Input raster") + .with_sql_example("SELECT RS_Width(raster)".to_string()) + .build() +} + +#[derive(Debug)] +struct RsWidth {} + +impl SedonaScalarKernel for RsWidth { + fn return_type(&self, args: &[SedonaType]) -> Result<Option<SedonaType>> { + let matcher = ArgMatcher::new( + vec![ArgMatcher::is_raster()], + SedonaType::Arrow(DataType::UInt64), + ); + + matcher.match_args(args) + } + + fn invoke_batch( + &self, + arg_types: &[SedonaType], + args: &[ColumnarValue], + ) -> Result<ColumnarValue> { + let executor = RasterExecutor::new(arg_types, args); + let mut builder = UInt64Builder::with_capacity(executor.num_iterations()); + Review Comment: Without the executor this might be something like: ```rust let executor = WkbExecutor::new(arg_types, args); let mut builder = UInt64Builder::with_capacity(executor.num_iterations()); for raster_ref in args[0].into_raster_iterator(executor.num_iterations()) { } executor.finish(Arc::new(builder.finish())) ``` (I know using the name `WkbExecutor` is a fishy...I'm open to better ideas 🙂 ) ########## rust/sedona-raster-functions/src/rs_size.rs: ########## @@ -0,0 +1,128 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +use std::{sync::Arc, vec}; + +use crate::executor::RasterExecutor; +use arrow_array::builder::UInt64Builder; +use arrow_schema::DataType; +use datafusion_common::error::Result; +use datafusion_expr::{ + scalar_doc_sections::DOC_SECTION_OTHER, ColumnarValue, Documentation, Volatility, +}; +use sedona_expr::scalar_udf::{SedonaScalarKernel, SedonaScalarUDF}; +use sedona_raster::traits::RasterRef; +use sedona_schema::{datatypes::SedonaType, matchers::ArgMatcher}; + +/// RS_Width() scalar UDF implementation +/// +/// Extract the width of the raster +pub fn rs_width_udf() -> SedonaScalarUDF { + SedonaScalarUDF::new( + "rs_width", + vec![Arc::new(RsWidth {})], + Volatility::Immutable, + Some(rs_width_doc()), + ) +} + +fn rs_width_doc() -> Documentation { + Documentation::builder( + DOC_SECTION_OTHER, + "Return the width component of a raster".to_string(), + "RS_Width(raster: Raster)".to_string(), + ) + .with_argument("raster", "Raster: Input raster") + .with_sql_example("SELECT RS_Width(raster)".to_string()) Review Comment: No need to do this right now, but a raster constructor would help here (I don't know if one exists, or maybe we can make a `RS_Example()` in the future to avoid typing extra long strings here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
