v1 on gcs isn't safe either as promotion from task attempt to
successful task is a dir rename; fast and atomic on hdfs, O(files) and
nonatomic on GCS.
if i can get that hadoop 3.3.5 rc out soon, the manifest committer will be
there to test https://issues.apache.org/jira/browse/MAPREDUCE-7341
unt
Some users have observed issues like what you're describing related to the
job commit algorithm, which is controlled by configuration
property spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.
Hadoop's default value for this setting is 2. You can find a description of
the algorithms in
Is your spark job batch or streaming?
From: Sandeep Vinayak
Sent: Tuesday, October 18, 2022 19:48
To: dev@spark.apache.org
Subject: Missing data in spark output
EXTERNAL SENDER. Do not click links or open attachments unless you recognize
the sender and know the
Hi,
We have observed similar behavior in older versions of spark. But we
were are currently using 3.3.0 where we have not seen such issues.
Which version of Spark and Hadoop are you using?
On 18/10/2022 19:48, Sandeep Vinayak wrote:
Hello Everyone,
We are recently observing an intermittent