What Nick said was correct.
What I should also state is that I am using python spark variant in this
case not the scala.
I am looking to use the guid prefix of part-0 to prevent a race condition by
using a s3 waiter for the part to appear, but to achieve this, I need to
know the guid value in adv
I should add that I tried using a waiter on the _SUCCESS file but it did not
prove successful as due to its small size compared to the part-0 file it
seems to be appearing before the part-0 file in s3, even though it was
written afterwards.
--
Sent from: http://apache-spark-developers-list.10015
I think what George is looking for is a way to determine ahead of time the
partition IDs that Spark will use when writing output.
George,
I believe this is an example of what you're looking for:
https://github.com/databricks/spark-redshift/blob/184b4428c1505dff7b4365963dc344197a92baa9/src/main/sc
If I understand your problem correctly, the prefix you provided is actually
"-" + UUID. You can get it by uuid generator like
https://docs.python.org/3/library/uuid.html#uuid.uuid4.
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
I lack the vocabulary for this question so please bear with my description of
the problem...
I am searching for a way to get the guid prefix value to be used to write
the parts of a file.
eg:
part-0-b5265e7b-b974-4083-a66e-e7698258ca50-c000.csv
I would like to get the prefix "0-b5265e7