Re: RDD filter in for loop gave strange results

Jacek Laskowski Wed, 20 Jan 2021 06:30:09 -0800

Hi Marco,

A Scala dev here.


In short: yet another reason against Python :)

Honestly, I've got no idea why the code gives the output. Ran it with
3.1.1-rc1 and got the very same results. Hoping pyspark/python devs will
chime in and shed more light on this.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Wed, Jan 20, 2021 at 2:07 PM Marco Wong <mck...@gmail.com> wrote:

> Dear Spark users,
>
> I ran the Python code below on a simple RDD, but it gave strange results.
> The filtered RDD contains non-existent elements which were filtered away
> earlier. Any idea why this happened?
> ```
> rdd = spark.sparkContext.parallelize([0,1,2])
> for i in range(3):
>     print("RDD is ", rdd.collect())
>     print("Filtered RDD is ", rdd.filter(lambda x:x!=i).collect())
>     rdd = rdd.filter(lambda x:x!=i)
>     print("Result is ", rdd.collect())
>     print()
> ```
> which gave
> ```
> RDD is  [0, 1, 2]
> Filtered RDD is  [1, 2]
> Result is  [1, 2]
>
> RDD is  [1, 2]
> Filtered RDD is  [0, 2]
> Result is  [0, 2]
>
> RDD is  [0, 2]
> Filtered RDD is  [0, 1]
> Result is  [0, 1]
> ```
>
> Thanks,
>
> Marco
>

Re: RDD filter in for loop gave strange results

Reply via email to