subject:"Re\: Spark DELETE FROM parallelism"

Re: Spark DELETE FROM parallelism

2021-05-25 Thread Huadong Liu

Thank you Anton! Follow up questions inline. On Tue, May 25, 2021 at 10:31 AM Anton Okolnychyi wrote: > The performance degradation you see is because Spark cannot push down > subquery expressions. That’s why Iceberg ends up scanning the complete > table. You will still rewrite only files that h

Re: Spark DELETE FROM parallelism

2021-05-25 Thread Anton Okolnychyi

The performance degradation you see is because Spark cannot push down subquery expressions. That’s why Iceberg ends up scanning the complete table. You will still rewrite only files that have matches but I assume joining the complete table will be expensive. In order to benefit from partition a