Thank you Imran
I will check whether there is memory waste or not
Imran Rashid 于2019年11月26日周二 上午1:30写道:
> I think Chang is right, but I also think this only comes up in limited
> scenarios. I initially thought it wasn't a bug, but after some more
> thought I have some concerns in light of the
Very well put Imran. This is a variant of executor failure after an RDD has
been computed (including caching). In general, non determinism in spark is
going to lead to inconsistency.
The only reasonable solution for us, at that time, was to make
pseudo-randomness repeatable and checkpoint after so
I think Chang is right, but I also think this only comes up in limited
scenarios. I initially thought it wasn't a bug, but after some more
thought I have some concerns in light of the issues we've had w/
nondeterministic RDDs, eg. repartition().
Say I have code like this:
val cachedRDD = sc.text
Hi all,
I will give you some informations about ARM CI of Spark:
Our team and community are working on build/test Spark master on ARM64
server, after find and fix some issues[1], we have integrated two ARM
testing jobs[2] to community CI(AMPLAB Jenkins),
they run as daily job and have been stablel
Hi everyone
We are looking for participants to help us with a study on the
sustainability of free and open source software, ‘Mapping the co-production
of digital infrastructure by peer projects and firms’, which is funded by a
Sloan and Ford Foundations grant.
We are trying to learn about how com
emmm, I haven't check code, but I think if an RDD is referenced in several
places, the correct behavior should be: when this RDD data is needed, it
will be computed and then cached only once, otherwise it should be treated
as a bug. If you are suspicious there's a race condition, you could create
a