Re: Spark 2.4 to Spark 3.5 migration - waiting for HMS

2024-12-04 Thread Matteo Moci
Thanks a lot Mich, 1) you mean coalescing partitions that are about to write? I don't think this will take less time, because all partitions have data. It seems the problem is that it asks HMS all partitions, even if it's only writing 650. Is it an improvement something that would benefit Spark? 2

Re: Spark 2.4 to Spark 3.5 migration - waiting for HMS

2024-12-04 Thread Matteo Moci
Thanks Mich, 1) On Thu, Dec 5, 2024 at 1:16 AM Mich Talebzadeh wrote: > Hi Matteo, > > 1) You have an incompatible Metastore: The Hive Metastore version used by > the EMR cluster (2.3.9) doesn't support the get_partition_locations method > directly. Spark 3.5 tries to use this method, leading to

Re: Spark 2.4 to Spark 3.5 migration - waiting for HMS

2024-12-04 Thread Mich Talebzadeh
Hi Matteo, 1) You have an incompatible Metastore: The Hive Metastore version used by the EMR cluster (2.3.9) doesn't support the get_partition_locations method directly. Spark 3.5 tries to use this method, leading to fallback and increased (Hive Metastore Service) HMS calls. 2) Large Number of Pa

Spark 2.4 to Spark 3.5 migration - waiting for HMS

2024-12-04 Thread Matteo Moci
Hello Community, The Spark 3.5 application I am working on shows slowness, right at the time of writing to a Hive table. I'd like to ask you some hints on how to mitigate this behaviour, if possible. The same application using Spark 2.4 ran "fine" within reasonable times, with minimal cluster idl