Dear Spark users,
I ran the Python code below on a simple RDD, but it gave strange results.
The filtered RDD contains non-existent elements which were filtered away
earlier. Any idea why this happened?
```
rdd = spark.sparkContext.parallelize([0,1,2])
for i in range(3):
print("RDD is ", rdd.collect())
print("Filtered RDD is ", rdd.filter(lambda x:x!=i).collect())
rdd = rdd.filter(lambda x:x!=i)
print("Result is ", rdd.collect())
print()
```
which gave
```
RDD is [0, 1, 2]
Filtered RDD is [1, 2]
Result is [1, 2]
RDD is [1, 2]
Filtered RDD is [0, 2]
Result is [0, 2]
RDD is [0, 2]
Filtered RDD is [0, 1]
Result is [0, 1]
```
Thanks,
Marco