Bump + adding one more issue I fixed (and by chance there's relevant report
in user mailing list recently)
* [SPARK-30462][SS] Streamline the logic on file stream source and sink to
avoid memory issue [1]
The patch stabilizes the driver's memory usage on utilizing a huge metadata
log, which was t
Thanks for looping in more folks :)
On Thu, Jun 25, 2020 at 7:41 PM Hyukjin Kwon wrote:
> Thank you so much, Holden.
>
> PS: I cc'ed some people who might be interested in this too FYI.
>
> 2020년 6월 26일 (금) 오전 11:26, Holden Karau 님이 작성:
>
>> At the recommendation of Hyukjin, I'm converting the g
Thank you so much, Holden.
PS: I cc'ed some people who might be interested in this too FYI.
2020년 6월 26일 (금) 오전 11:26, Holden Karau 님이 작성:
> At the recommendation of Hyukjin, I'm converting the graceful
> decommissioning work to an SPIP. The SPIP document is at
> https://docs.google.com/document
At the recommendation of Hyukjin, I'm converting the graceful
decommissioning work to an SPIP. The SPIP document is at
https://docs.google.com/document/d/1EOei24ZpVvR7_w0BwBjOnrWRy4k-qTdIlx60FsHZSHA/edit?usp=sharing
and the associated JIRA is at
https://issues.apache.org/jira/browse/SPARK-20624. Th
I was trying to make my email short and concise, but the rationale behind
setting that as 1 by default is because it's safer. With algorithm version
2 you run the risk of having bad data in cases where tasks fail or even
duplicate data if a task fails and succeeds on a reattempt (I don't know if
th
I think is a Hadoop property that is just passed through? if the
default is different in Hadoop 3 we could mention that in the docs. i
don't know if we want to always set it to 1 as a Spark default, even
in Hadoop 3 right?
On Thu, Jun 25, 2020 at 2:43 PM Waleed Fateem wrote:
>
> Hello!
>
> I noti
Hello Bart,
Thank you for sharing these links, this was exactly what Tahsin and I were
looking for. It looks like there has been a lot of discussion about this
already, which is good to see.
In one of these pull requests, there is a comment about the number of
real-world use-cases for some kin
Hello!
I noticed that in the documentation starting with 2.2.0 it states that the
parameter spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1
by default:
https://issues.apache.org/jira/browse/SPARK-20107
I don't actually see this being set anywhere explicitly in the Spark code
and
I dont have a strong opinion on changing default too but I also a little
bit more prefer to have the option to switch Hadoop version first just to
stay safer.
To be clear, we're more now discussing about the timing about when to set
Hadoop 3.0.0 by default, and which change has to be first, right?