Thanks a lot Mich,
1) you mean coalescing partitions that are about to write? I don't think
this will take less time, because all partitions have data. It seems the
problem is that it asks HMS all partitions, even if it's only writing 650.
Is it an improvement something that would benefit Spark?
2
Thanks Mich,
1)
On Thu, Dec 5, 2024 at 1:16 AM Mich Talebzadeh
wrote:
> Hi Matteo,
>
> 1) You have an incompatible Metastore: The Hive Metastore version used by
> the EMR cluster (2.3.9) doesn't support the get_partition_locations method
> directly. Spark 3.5 tries to use this method, leading to
Hi Matteo,
1) You have an incompatible Metastore: The Hive Metastore version used by
the EMR cluster (2.3.9) doesn't support the get_partition_locations method
directly. Spark 3.5 tries to use this method, leading to fallback and
increased (Hive Metastore Service) HMS calls.
2) Large Number of Pa
Hello Community,
The Spark 3.5 application I am working on shows slowness, right at the time
of writing to a Hive table.
I'd like to ask you some hints on how to mitigate this behaviour, if
possible.
The same application using Spark 2.4 ran "fine" within reasonable times,
with minimal cluster idl