ueshin commented on code in PR #50470:
URL: https://github.com/apache/spark/pull/50470#discussion_r2027711400


##########
python/pyspark/sql/connect/column.py:
##########
@@ -461,6 +462,18 @@ def outer(self) -> ParentColumn:
         return Column(self._expr)
 
     def isin(self, *cols: Any) -> ParentColumn:
+        from pyspark.sql.connect.dataframe import DataFrame
+
+        if len(cols) == 1 and isinstance(cols[0], DataFrame):
+            if isinstance(self._expr, UnresolvedFunction) and self._expr._name 
== "struct":

Review Comment:
   In SQL, for example:
   
   ```sql
   select * from l where (a, b) in (select c, d from r)
   ```
   
   `(a, b)` is implicitly `struct(a, b)`, so
   
   ```sql
   select * from l where struct(a, b) in (select c, d from r)
   ```
   
   is the same as the above.
   
   If constructing a struct value for `col` there, it needs to be 
`struct(struct(a, b))`.
   
   ```sql
   select * from l where struct(struct(a, b)) in (select struct(c as a, d as b) 
from r)
   ```
   
   This is same with this API:
   
   ```py
   spark.table("l").where(
     sf.struct(sf.struct("a", "b")).isin(
       spark.table("r").select(sf.col("c").alias("a"), sf.col("d").alias("b"))
     )
   )
   ```
   
   If the value is already a struct type, it can be used there as-is:
   
   ```sql
   select * from l where sab in (select scd from r)
   ```
   
   ```py
   spark.table("l").where(
     sf.col("sab").isin(
       spark.table("r").select("scd")
     )
   )
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to