> Would it be problematic to simply prefix a random number, or
> timestamp, on the front of the file name to make it unique?
For bucketed tables - they rely on the prefix to determine which bucket it
belongs to.
So if you have a bucketed table and insert into it twice, then this turns into
0000_0 + 0000_0_Copy_1
which is logically the 1st bucket (if this is a sorted table, then it is a
sort-merge to read out, not a one-after-other).
There's a set of race conditions with that loop when it comes to something with
weak consistency like S3, which is why hive managed tables have switched to a
delta_<id>/0000_0 instead of _Copy_<n> starting in Hive 3.0.
And where "id" is actually stored in the table metadata (so that no two queries
will use the same delta_<id> dir).
Cheers,
Gopal