Hi, I'm trying to run queries with many values in IN operator. The result is that for more than 10K values IN operator is getting slower.
For example this code is running about 20 seconds. df = spark.range(0,100000,1,1) df.where('id in ({})'.format(','.join(map(str,range(100000))))).count() Any ideas how to improve this ? Is it a bug ? -- Maciek Bryński --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org