I am not sure why, but the mailing list is saying. "This post has NOT been accepted by the mailing list yet".
On Mon, 3 Apr 2017 at 20:52 mathewwicks [via Apache Spark User List] < ml-node+s1001560n28558...@n3.nabble.com> wrote: > Here is an example to illustrate my point. > > In this toy example, we are collecting a list of the other products that > each user has bought, and appending it as a new column. (Also note, that we > are filtering on some arbitrary column 'good_bad'.) > > I would like to know if we support NOT including the CURRENT ROW in the > PARTITION BY. > (E.g. transaction 1 would have `other_purchases = [prod2, prod3]` rather > than `other_purchases = [prod1, prod2, prod3]`) > > ------------------- Code Below ------------------- > > df = spark.createDataFrame([ > (1, "user1", "prod1", "good"), > (2, "user1", "prod2", "good"), > (3, "user1", "prod3", "good"), > (4, "user2", "prod3", "bad"), > (5, "user2", "prod4", "good"), > (5, "user2", "prod5", "good")], > ("trans_id", "user_id", "prod_id", "good_bad") > ) > df.show() > > df = df.selectExpr( > "trans_id", > "user_id", > "COLLECT_LIST(CASE WHEN good_bad == 'good' THEN prod_id END) > OVER(PARTITION BY user_id) AS other_purchases" > ) > df.show() > ---------------------------------------------------- > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Do-we-support-excluding-the-current-row-in-PARTITION-BY-windowing-functions-tp28558.html > This email was sent by mathewwicks > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=8051> > (via Nabble) > To receive all replies by email, subscribe to this discussion > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_code&node=28558&code=bWF0aGV3LndpY2tzQGdtYWlsLmNvbXwyODU1OHwtODk0MjA2NjY=> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-we-support-excluding-the-current-row-in-PARTITION-BY-windowing-functions-tp28558p28559.html Sent from the Apache Spark User List mailing list archive at Nabble.com.