What Nick said was correct.
What I should also state is that I am using python spark variant in this
case not the scala.
I am looking to use the guid prefix of part-0 to prevent a race condition by
using a s3 waiter for the part to appear, but to achieve this, I need to
know the guid value in adv
I should add that I tried using a waiter on the _SUCCESS file but it did not
prove successful as due to its small size compared to the part-0 file it
seems to be appearing before the part-0 file in s3, even though it was
written afterwards.
--
Sent from: http://apache-spark-developers-list.10015
I lack the vocabulary for this question so please bear with my description of
the problem...
I am searching for a way to get the guid prefix value to be used to write
the parts of a file.
eg:
part-0-b5265e7b-b974-4083-a66e-e7698258ca50-c000.csv
I would like to get the prefix "0-b5265e7
As someone who mainly operates in AWS it would be very welcome to have the
option to use an updated version of hadoop using pyspark sourced from pypi.
Acknowledging the issues of backwards compatability...
The most vexing issue is the lack of ability to use s3a STS, ie
org.apache.hadoop.fs.s3a.Te