Hi All,
I have a use case where we try to run multiple parallel Spark SQL query to
write data to the same table but different partitions to speed up things.
But looks like different applications will end up using the same _temporary
directory
for ex:-
/user/hive/db/tb/_temporary/0/task_2021031910
The PR can be found here: https://github.com/apache/spark/pull/31905
Am 19.03.21 um 10:55 schrieb Enrico Minack:
I'll sketch out a PR so we can talk code and move the discussion there.
Am 18.03.21 um 14:55 schrieb Wenchen Fan:
I thinkĀ a listener-based API makes sense for streaming (since yo