findepi commented on code in PR #13637:
URL: https://github.com/apache/datafusion/pull/13637#discussion_r1871373752
##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
}
}
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+ // Calling `unary` creates a new array for the results. Avoiding
+ // allocations is a common optimization in performance critical code.
+ // arrow-rs allows this optimization via the `unary_mut`
+ // and `binary_mut` kernels in certain cases
+ //
+ // These kernels can only be used if there are no other references to
+ // the arrays (exp_array has to be the last remaining reference).
+ let owned_array = exp_array
+ // as before we downcast to Float64Array
+ .as_primitive::<Float64Type>()
+ // non-obviously, we call clone here to get an owned `Float64Array`.
+ // Calling clone() is relatively inexpensive as it increments
+ // some ref counts but doesn't clone the data)
+ //
+ // Once we have the owned Float64Array we can drop the original
+ // exp_array (untyped) reference
+ .clone();
+
+ // We *MUST* drop this reference explicitly so that owned_array
+ // is the last remaining reference
+ drop(exp_array);
+
+ // at this point, exp_array is the only reference in this function.
However,
+ // depending on the query there may be other references to this array.
+ //
+ // If we have the only reference, compute the result
+ // directly into the same allocation as was used for the input array
+ match compute::unary_mut(owned_array, |exp| base.powf(exp)) {
+ Err(_orig_array) => {
+ // unary_mut will return the original array if there are other
+ // references into the underling buffer (and thus reuse is
+ // impossible)
+ //
+ // In a real implementation, this case should fall back to
+ // calling `unary` and allocate a new array; In our example
+ // we will return an error for demonstration purposes
+ exec_err!("Could not reuse array for pow_in_place")
Review Comment:
What if my pow udf is called in a query like
```
select pow(10, f), f from ..
```
in such case, re-use shouldn't be possible, unless there was some
(redundant) data copying before the pow call.
##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
}
}
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+ // Calling `unary` creates a new array for the results. Avoiding
+ // allocations is a common optimization in performance critical code.
+ // arrow-rs allows this optimization via the `unary_mut`
+ // and `binary_mut` kernels in certain cases
+ //
+ // These kernels can only be used if there are no other references to
+ // the arrays (exp_array has to be the last remaining reference).
+ let owned_array = exp_array
+ // as before we downcast to Float64Array
Review Comment:
what does "before" refer to here? code above, or previous version of the
code?
##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
}
}
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
Review Comment:
so it's "pow maybe in place", not "pow in place"
worth reflecting in the function name (eg with addition of "maybe")?
otherwise i'd call this just `pow`
##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
}
}
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+ // Calling `unary` creates a new array for the results. Avoiding
+ // allocations is a common optimization in performance critical code.
+ // arrow-rs allows this optimization via the `unary_mut`
+ // and `binary_mut` kernels in certain cases
+ //
+ // These kernels can only be used if there are no other references to
+ // the arrays (exp_array has to be the last remaining reference).
+ let owned_array = exp_array
+ // as before we downcast to Float64Array
+ .as_primitive::<Float64Type>()
+ // non-obviously, we call clone here to get an owned `Float64Array`.
+ // Calling clone() is relatively inexpensive as it increments
+ // some ref counts but doesn't clone the data)
+ //
+ // Once we have the owned Float64Array we can drop the original
+ // exp_array (untyped) reference
Review Comment:
This function is called with `ArrayRef`.
How did we ensure this was the only reference?
##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
}
}
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+ // Calling `unary` creates a new array for the results. Avoiding
+ // allocations is a common optimization in performance critical code.
+ // arrow-rs allows this optimization via the `unary_mut`
+ // and `binary_mut` kernels in certain cases
+ //
+ // These kernels can only be used if there are no other references to
+ // the arrays (exp_array has to be the last remaining reference).
+ let owned_array = exp_array
+ // as before we downcast to Float64Array
+ .as_primitive::<Float64Type>()
+ // non-obviously, we call clone here to get an owned `Float64Array`.
+ // Calling clone() is relatively inexpensive as it increments
+ // some ref counts but doesn't clone the data)
+ //
+ // Once we have the owned Float64Array we can drop the original
+ // exp_array (untyped) reference
+ .clone();
+
+ // We *MUST* drop this reference explicitly so that owned_array
+ // is the last remaining reference
+ drop(exp_array);
+
+ // at this point, exp_array is the only reference in this function.
However,
Review Comment:
at this point, exp_array doesn't exist anymore.
and we still don't know whether we have only reference, or shared.
we know it only based on return result from `compute::unary_mut`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]