UBarney commented on code in PR #19411:
URL: https://github.com/apache/datafusion/pull/19411#discussion_r2636743834


##########
datafusion/common/src/config.rs:
##########
@@ -468,6 +468,25 @@ config_namespace! {
         /// metadata memory consumption
         pub batch_size: usize, default = 8192
 
+        /// A perfect hash join will be considered if the number of rows on 
the build
+        /// side is below this threshold. This provides a fast path for joins 
with
+        /// very small build sides, bypassing the density check.
+        /// 
+        /// TODO: Currently only supports cases where left_side.num_rows() < 
u32::MAX.
+        /// Support for left_side.num_rows() >= u32::MAX will be added in the 
future.
+        pub perfect_hash_join_small_build_threshold: usize, default = 1024
+
+        /// The minimum required density of join keys on the build side to 
consider a
+        /// perfect hash join. Density is calculated as:
+        /// `(number of rows) / (max_key - min_key + 1)`.
+        /// A perfect hash join may be used if the actual key density exceeds 
this
+        /// value. For example, a value of 0.99 means the keys must fill at 
least
+        /// 99% of their value range.
+        /// 
+        /// TODO: Currently only supports cases where left_side.num_rows() < 
u32::MAX.
+        /// Support for left_side.num_rows() >= u32::MAX will be added in the 
future.
+        pub perfect_hash_join_min_key_density: f64, default = 0.99

Review Comment:
   That's a great point. 
   I'll add some benchmarks to compare the performance at different densities, 
including 75%, to find the optimal value for our use case. I'll update this 
based on the results.
   Thanks for the suggestion!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to