Hello All! I'm trying to filter some rows in my DataFrame. I created a list with ids and I use the construction: df_new = df.filter(df.user.isin(list_users)) The first (df) DataFrame consists on 29711562 rows but the new one - 5394805. OK, I've decided to use another one method: df_new = df.join(df_ids, df.user==df_ids.user, how='inner'). df_ids is a dataframe where in rows ids (ids are unique). And I wanted to find a common part of ids according to this method but again I got a new dataframe which is bigger the previous one. May be someone knows the right answer how to implement this in a right way? Thank you!
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/filter-function-works-incorretly-Python-tp29099.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org