Hmm, I think I got what Jingnan means. The lambda function is x != i and i is not evaluated when the lambda function was defined. So the pipelined rdd is rdd.filter(lambda x: x != i).filter(lambda x: x != i), rather than having the values of i substituted. Does that make sense to you, Sean?
On Wed, 20 Jan 2021 at 15:51, Sean Owen <sro...@gmail.com> wrote: > No, because the final rdd is really the result of chaining 3 filter > operations. They should all execute. It _should_ work like > "rdd.filter(...).filter(..).filter(...)" > > On Wed, Jan 20, 2021 at 9:46 AM Zhu Jingnan <jingnanzh...@gmail.com> > wrote: > >> I thought that was right result. >> >> As rdd runs on a lacy basis. so every time rdd.collect() executed, the i >> will be updated to the latest i value, so only one will be filter out. >> >> Regards >> >> Jingnan >> >> >>