jonathanc-n opened a new issue, #16425: URL: https://github.com/apache/datafusion/issues/16425
### Is your feature request related to a problem or challenge? This is part of #13181 for looking into different joins. ## What is a Single Join Single joins are similar to a regular left outer join however the build side can only have zero or one match with the probe side. If there are two or matches for one build side row, it will throw an error during runtime. The reason we need a separate join type for this is because other join types do not keep track of these number of matches. ## What is it used for Single joins are used to optimize scalar subqueries. In Neumann's paper for unnesting subqueries ([here](https://btw-2015.informatik.uni-hamburg.de/res/proceedings/Hauptband/Wiss/Neumann-Unnesting_Arbitrary_Querie.pdf)), he writes on dependent joins, which is just a join where every iteration of the left side will incur a search on the right side (right side dependent on left). Here is an example of a scalar subquery: ``` SELECT e.name, ( SELECT d.dept_name FROM departments d WHERE d.dept_id = e.dept_id ) AS department FROM employees e; ``` Scalar subqueries act as a type of dependent join due to the subquery needing to be calculated for every row on the left. We can eliminate this by creating a hash table out of the subquery side and probing with the build side. This cuts the `O(n^2)` runtime to `O(n)`. If the build side matches on more than two rows we call an error as it would be an invalid scalar subquery. ### Describe the solution you'd like - Optimize scalar subqueries into single joins - Using the existing hash join execution plan we can create a path for single join (duckdb does something similar to this) - Do checks on each probe to see if it matched multiple rows; call execution error if it did. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org