ueshin commented on code in PR #50470: URL: https://github.com/apache/spark/pull/50470#discussion_r2027711400
########## python/pyspark/sql/connect/column.py: ########## @@ -461,6 +462,18 @@ def outer(self) -> ParentColumn: return Column(self._expr) def isin(self, *cols: Any) -> ParentColumn: + from pyspark.sql.connect.dataframe import DataFrame + + if len(cols) == 1 and isinstance(cols[0], DataFrame): + if isinstance(self._expr, UnresolvedFunction) and self._expr._name == "struct": Review Comment: In SQL, for example: ```sql select * from l where (a, b) in (select c, d from r) ``` `(a, b)` is implicitly `struct(a, b)`, so ```sql select * from l where struct(a, b) in (select c, d from r) ``` is the same as the above. If constructing a struct value for `col` there, it needs to be `struct(struct(a, b))`. ```sql select * from l where struct(struct(a, b)) in (select struct(c as a, d as b) from r) ``` This is same with this API: ```py spark.table("l").where( sf.struct(sf.struct("a", "b")).isin( spark.table("r").select(sf.col("c").alias("a"), sf.col("d").alias("b")) ) ) ``` If the value is already a struct type, it can be used there as-is: ```sql select * from l where sab in (select scd from r) ``` ```py spark.table("l").where( sf.col("sab").isin( spark.table("r").select("scd") ) ) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org