Hello All!
I'm trying to filter some rows in my DataFrame.
I created a list with ids and I use the construction:
df_new = df.filter(df.user.isin(list_users))
The first (df) DataFrame consists on 29711562 rows but the new one -
5394805.
OK, I've decided to use another one method:
df_new = df.join(df_ids, df.user==df_ids.user, how='inner').
df_ids is a dataframe where in rows ids (ids are unique). And I wanted to
find a common part of ids according to this method but again I got a new
dataframe which is bigger the previous one.
May be someone knows the right answer how to implement this in a right way?
Thank you!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/filter-function-works-incorretly-Python-tp29099.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to