alamb commented on code in PR #13637:
URL: https://github.com/apache/datafusion/pull/13637#discussion_r1874553258


##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
     }
 }
 
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+    // Calling `unary` creates a new array for the results. Avoiding
+    // allocations is a common optimization in performance critical code.
+    // arrow-rs allows this optimization via the `unary_mut`
+    // and `binary_mut` kernels in certain cases
+    //
+    // These kernels can only be used if there are no other references to
+    // the arrays (exp_array has to be the last remaining reference).
+    let owned_array = exp_array
+        // as before we downcast to Float64Array

Review Comment:
   The code above (aka the previous example) -- I have clarified



##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
     }
 }
 
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+    // Calling `unary` creates a new array for the results. Avoiding
+    // allocations is a common optimization in performance critical code.
+    // arrow-rs allows this optimization via the `unary_mut`
+    // and `binary_mut` kernels in certain cases
+    //
+    // These kernels can only be used if there are no other references to
+    // the arrays (exp_array has to be the last remaining reference).
+    let owned_array = exp_array
+        // as before we downcast to Float64Array
+        .as_primitive::<Float64Type>()
+        // non-obviously, we call clone here to get an owned `Float64Array`.
+        // Calling clone() is relatively inexpensive as it increments
+        // some ref counts but doesn't clone the data)
+        //
+        // Once we have the owned Float64Array we can drop the original
+        // exp_array (untyped) reference

Review Comment:
   I tried to clarify this in the comments -- the call to `unary_mut` fails if 
there are other outstanding references.



##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
     }
 }
 
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {

Review Comment:
   Yes, I agree -- renamed to `maybe_pow_in_place`



##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
     }
 }
 
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+    // Calling `unary` creates a new array for the results. Avoiding
+    // allocations is a common optimization in performance critical code.
+    // arrow-rs allows this optimization via the `unary_mut`
+    // and `binary_mut` kernels in certain cases
+    //
+    // These kernels can only be used if there are no other references to
+    // the arrays (exp_array has to be the last remaining reference).
+    let owned_array = exp_array
+        // as before we downcast to Float64Array
+        .as_primitive::<Float64Type>()
+        // non-obviously, we call clone here to get an owned `Float64Array`.
+        // Calling clone() is relatively inexpensive as it increments
+        // some ref counts but doesn't clone the data)
+        //
+        // Once we have the owned Float64Array we can drop the original
+        // exp_array (untyped) reference
+        .clone();
+
+    // We *MUST* drop this reference explicitly so that owned_array
+    // is the last remaining reference
+    drop(exp_array);
+
+    // at this point, exp_array is the only reference in this function. 
However,
+    // depending on the query there may be other references to this array.
+    //
+    // If we have the only reference, compute the result
+    // directly into the same allocation as was used for the input array
+    match compute::unary_mut(owned_array, |exp| base.powf(exp)) {
+        Err(_orig_array) => {
+            // unary_mut will return the original array if there are other
+            // references into the underling buffer (and thus reuse is
+            // impossible)
+            //
+            // In a real implementation, this case should fall back to
+            // calling `unary` and allocate a new array; In our example
+            // we will return an error for demonstration purposes
+            exec_err!("Could not reuse array for pow_in_place")

Review Comment:
   yes, that is correct to the best of my understanding



##########
datafusion-examples/examples/advanced_udf.rs:
##########
@@ -191,6 +199,51 @@ impl ScalarUDFImpl for PowUdf {
     }
 }
 
+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {
+    // Calling `unary` creates a new array for the results. Avoiding
+    // allocations is a common optimization in performance critical code.
+    // arrow-rs allows this optimization via the `unary_mut`
+    // and `binary_mut` kernels in certain cases
+    //
+    // These kernels can only be used if there are no other references to
+    // the arrays (exp_array has to be the last remaining reference).
+    let owned_array = exp_array
+        // as before we downcast to Float64Array
+        .as_primitive::<Float64Type>()
+        // non-obviously, we call clone here to get an owned `Float64Array`.
+        // Calling clone() is relatively inexpensive as it increments
+        // some ref counts but doesn't clone the data)
+        //
+        // Once we have the owned Float64Array we can drop the original
+        // exp_array (untyped) reference
+        .clone();
+
+    // We *MUST* drop this reference explicitly so that owned_array
+    // is the last remaining reference
+    drop(exp_array);
+
+    // at this point, exp_array is the only reference in this function. 
However,

Review Comment:
   Yes, sorry -- I think this meant to say "owned_array" is the only reference. 
I have tried to clarify the wording. This is a good catch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to