Here is an example to illustrate my question. In this toy example, we are collecting a list of the other products that each user has bought, and appending it as a new column. (Also note, that we are filtering on some arbitrary column 'good_bad'.)
I would like to know if we support NOT including the CURRENT ROW in the OVER(PARTITION BY xxx) windowing function. For example, transaction 1 would have `other_purchases = [prod2, prod3]` rather than `other_purchases = [prod1, prod2, prod3]`. *------------------- Code Below -------------------* df = spark.createDataFrame([ (1, "user1", "prod1", "good"), (2, "user1", "prod2", "good"), (3, "user1", "prod3", "good"), (4, "user2", "prod3", "bad"), (5, "user2", "prod4", "good"), (5, "user2", "prod5", "good")], ("trans_id", "user_id", "prod_id", "good_bad") ) df.show() df = df.selectExpr( "trans_id", "user_id", "COLLECT_LIST(CASE WHEN good_bad == 'good' THEN prod_id END) OVER(PARTITION BY user_id) AS other_purchases" ) df.show() *----------------------------------------------------* Here is a stackoverflow link: https://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions <https://stackoverflow.com/questions/43180723/spark-sql-excluding-the-current-row-in-partition-by-windowing-functions> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-we-support-excluding-the-CURRENT-ROW-in-PARTITION-BY-windowing-functions-tp28565.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org