Re: get method guid prefix for file parts for write

2020-09-25 Thread gpongracz
What Nick said was correct. What I should also state is that I am using python spark variant in this case not the scala. I am looking to use the guid prefix of part-0 to prevent a race condition by using a s3 waiter for the part to appear, but to achieve this, I need to know the guid value in adv

Re: get method guid prefix for file parts for write

2020-09-25 Thread gpongracz
I should add that I tried using a waiter on the _SUCCESS file but it did not prove successful as due to its small size compared to the part-0 file it seems to be appearing before the part-0 file in s3, even though it was written afterwards. -- Sent from: http://apache-spark-developers-list.10015

get method guid prefix for file parts for write

2020-09-24 Thread gpongracz
I lack the vocabulary for this question so please bear with my description of the problem... I am searching for a way to get the guid prefix value to be used to write the parts of a file. eg: part-0-b5265e7b-b974-4083-a66e-e7698258ca50-c000.csv I would like to get the prefix "0-b5265e7

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-07-11 Thread gpongracz
As someone who mainly operates in AWS it would be very welcome to have the option to use an updated version of hadoop using pyspark sourced from pypi. Acknowledging the issues of backwards compatability... The most vexing issue is the lack of ability to use s3a STS, ie org.apache.hadoop.fs.s3a.Te