Re: Is RDD thread safe?

2019-11-25 Thread Chang Chen
Thank you Imran I will check whether there is memory waste or not Imran Rashid 于2019年11月26日周二 上午1:30写道: > I think Chang is right, but I also think this only comes up in limited > scenarios. I initially thought it wasn't a bug, but after some more > thought I have some concerns in light of the

Re: Is RDD thread safe?

2019-11-25 Thread Mridul Muralidharan
Very well put Imran. This is a variant of executor failure after an RDD has been computed (including caching). In general, non determinism in spark is going to lead to inconsistency. The only reasonable solution for us, at that time, was to make pseudo-randomness repeatable and checkpoint after so

Re: Is RDD thread safe?

2019-11-25 Thread Imran Rashid
I think Chang is right, but I also think this only comes up in limited scenarios. I initially thought it wasn't a bug, but after some more thought I have some concerns in light of the issues we've had w/ nondeterministic RDDs, eg. repartition(). Say I have code like this: val cachedRDD = sc.text

Status of Spark testing on ARM64

2019-11-25 Thread Tianhua huang
Hi all, I will give you some informations about ARM CI of Spark: Our team and community are working on build/test Spark master on ARM64 server, after find and fix some issues[1], we have integrated two ARM testing jobs[2] to community CI(AMPLAB Jenkins), they run as daily job and have been stablel

Apache/Spark community members are invited to take part in the Express 2019 F/OSS-Firms Survey!

2019-11-25 Thread moneil
Hi everyone We are looking for participants to help us with a study on the sustainability of free and open source software, ‘Mapping the co-production of digital infrastructure by peer projects and firms’, which is funded by a Sloan and Ford Foundations grant. We are trying to learn about how com

Re: Is RDD thread safe?

2019-11-25 Thread Weichen Xu
emmm, I haven't check code, but I think if an RDD is referenced in several places, the correct behavior should be: when this RDD data is needed, it will be computed and then cached only once, otherwise it should be treated as a bug. If you are suspicious there's a race condition, you could create a