Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

via GitHub Sun, 10 Aug 2025 20:36:22 -0700


jonathanc-n commented on code in PR #16660:
URL: https://github.com/apache/datafusion/pull/16660#discussion_r2265607929



##########
datafusion/physical-plan/src/joins/piecewise_merge_join.rs:
##########
@@ -0,0 +1,2059 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::array::{new_null_array, Array, RecordBatchOptions};
+use arrow::compute::take;
+use arrow::{
+    array::{
+        ArrayRef, BooleanBufferBuilder, RecordBatch, UInt32Array, 
UInt32Builder,
+        UInt64Array, UInt64Builder,
+    },
+    compute::{concat_batches, sort_to_indices, take_record_batch},
+    util::bit_util,
+};
+use arrow_schema::{ArrowError, Schema, SchemaRef, SortOptions};
+use datafusion_common::NullEquality;
+use datafusion_common::{
+    exec_err, internal_err, plan_err, utils::compare_rows, JoinSide, Result, 
ScalarValue,
+};
+use datafusion_execution::{
+    memory_pool::{MemoryConsumer, MemoryReservation},
+    RecordBatchStream, SendableRecordBatchStream,
+};
+use datafusion_expr::{JoinType, Operator};
+use datafusion_functions_aggregate_common::min_max::{max_batch, min_batch};
+use datafusion_physical_expr::equivalence::join_equivalence_properties;
+use datafusion_physical_expr::{
+    LexOrdering, OrderingRequirements, PhysicalExpr, PhysicalExprRef, 
PhysicalSortExpr,
+};
+use datafusion_physical_expr_common::physical_expr::fmt_sql;
+use futures::{Stream, StreamExt, TryStreamExt};
+use parking_lot::Mutex;
+use std::fmt::Formatter;
+use std::{cmp::Ordering, task::ready};
+use std::{sync::Arc, task::Poll};
+
+use crate::execution_plan::{boundedness_from_children, EmissionType};
+
+use crate::joins::sort_merge_join::compare_join_arrays;
+use crate::joins::utils::{
+    get_final_indices_from_shared_bitmap, symmetric_join_output_partitioning,
+};
+use crate::{handle_state, DisplayAs, DisplayFormatType, 
ExecutionPlanProperties};
+use crate::{
+    joins::{
+        utils::{
+            build_join_schema, BuildProbeJoinMetrics, OnceAsync, OnceFut,
+            StatefulStreamResult,
+        },
+        SharedBitmapBuilder,
+    },
+    metrics::ExecutionPlanMetricsSet,
+    spill::get_record_batch_memory_size,
+    ExecutionPlan, PlanProperties,
+};
+
+/// `PiecewiseMergeJoinExec` is a join execution plan that only evaluates 
single range filter.
+///
+/// The physical planner will choose to evalute this join when there is only 
one range predicate. This
+/// is a binary expression which contains [`Operator::Lt`], 
[`Operator::LtEq`], [`Operator::Gt`], and
+/// [`Operator::GtEq`].:
+/// Examples:
+///  - `col0` < `colb`, `col0` <= `colb`, `col0` > `colb`, `col0` >= `colb`
+///
+/// Since the join only support range predicates, equijoins are not supported 
in `PiecewiseMergeJoinExec`,
+/// however you can first evaluate another join and run 
`PiecewiseMergeJoinExec` if left with one range
+/// predicate.

Review Comment:
   Ah sorry. I was thinking of it as the result of two hash joins (when the 
results of two hash joins are joined together). It seems to be a more common 
workload for PWMJ. 
   
   > We should do something similar for PMJ: use inequality as the strong 
pre-fiter, and allow the remaining ANDed predicates still applicable. Though 
for this example how to choose between HJ and PMJ is hard, we can use a simple 
heuristics for now.
   
   Wow this could be a nice idea. if hash join is very low selectivity + having 
to filter after, that would seem to be much slower than if the pwmj could do a 
high selective predicate filter and then just run the equijoin condition after. 
I do think this type of data would be much rarer though. Definitely worth 
noting down though
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

Reply via email to