zhuqi-lucas commented on code in PR #14245:
URL: https://github.com/apache/datafusion/pull/14245#discussion_r1929438149


##########
datafusion/sqllogictest/test_files/joins.slt:
##########
@@ -4247,8 +4247,10 @@ logical_plan
 physical_plan
 01)CoalesceBatchesExec: target_batch_size=3, fetch=2
 02)--HashJoinExec: mode=CollectLeft, join_type=Full, on=[(c1@0, c1@0)]
-03)----MemoryExec: partitions=1, partition_sizes=[1]
-04)----MemoryExec: partitions=1, partition_sizes=[1]
+03)----GlobalLimitExec: skip=0, fetch=2

Review Comment:
   > I didn't think it is correct to push a limit below a Full join -- a 
FullJoin will create null values to match any misisng rows 🤔 So even if you 
limited both sides you'll still get rows out of there that shouldn't be ...
   
   
   
   Thank you @alamb for review, this is a good point,  and we don't do anything 
for the full join limit in current PR.
   
   
   **The logical plan already push down full join limit since:**
   [#12963](https://github.com/apache/datafusion/pull/12963)
   
   For example the following code:
   ```
   ## Test !join.on.is_empty() && join.filter.is_none()
   query TT
   EXPLAIN SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2;
   ----
   logical_plan
   01)Limit: skip=0, fetch=2
   02)--Full Join: t0.c1 = t1.c1
   03)----Limit: skip=0, fetch=2
   04)------TableScan: t0 projection=[c1, c2], fetch=2
   05)----Limit: skip=0, fetch=2
   06)------TableScan: t1 projection=[c1, c2, c3], fetch=2
   ```
   
   
   And the physical plan will apply the limit pushdown, but without current PR, 
the child limit will be overridden by parent limit, so it does not  show the 
limit in physical plan before.
   
   **I suggest we can create a following issue to discuss do we need to 
fallback or change the full join push down limit case:**
   [#12963](https://github.com/apache/datafusion/pull/12963)
   
   What's your opinion? Thanks a lot!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to