date:20150805

Re: Avoiding unnecessary build changes until tests are in better shape

2015-08-05 Thread Josh Rosen

+1. I've been holding off on reviewing / merging patches like the run-tests-jenkins Python refactoring for exactly this reason. On 8/5/15 11:24 AM, Patrick Wendell wrote: Hey All, Was wondering if people would be willing to avoid merging build changes until we have put the tests in better sha

Avoiding unnecessary build changes until tests are in better shape

2015-08-05 Thread Patrick Wendell

Hey All, Was wondering if people would be willing to avoid merging build changes until we have put the tests in better shape. The reason is that build changes are the most likely to cause downstream issues with the test matrix and it's very difficult to reverse engineer which patches caused which

Re:

2015-08-05 Thread Jonathan Winandy

Hello ! You could try something like that : def exists[T](rdd:RDD[T])(f:T=>Boolean, n:Long):Boolean = { val context: SparkContext = rdd.sparkContext val grp: String = Random.alphanumeric.take(10).mkString context.setJobGroup(grp, "exist") val count: Accumulator[Long] = context.accumulato

Re:

2015-08-05 Thread Feynman Liang

qualifying_function() will be executed on each partition in parallel; stopping all parallel execution after the first instance satisfying qualifying_function() would mean that you would have to effectively make the computation sequential. On Wed, Aug 5, 2015 at 9:05 AM, Sandeep Giri wrote: > Oka

Re: Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'

2015-08-05 Thread Guru Medasani

Following up on this thread to see if anyone has some thoughts or opinions on the mentioned approach. Guru Medasani gdm...@gmail.com > On Aug 3, 2015, at 10:20 PM, Guru Medasani wrote: > > Hi, > > I was looking at the spark-submit and spark-shell --help on both (Spark > 1.3.1 and Spark 1

Re:

2015-08-05 Thread Sandeep Giri

Okay. I think I got it now. Yes take() does not need to be called more than once. I got the impression that we wanted to bring elements to the driver node and then run out qualifying_function on driver_node. Now, I am back to my question which I started with: Could there be an approach where the q

Re:

2015-08-05 Thread Sean Owen

take only brings n elements to the driver, which is probably still a win if n is small. I'm not sure what you mean by only taking a count argument -- what else would be an arg to take? On Wed, Aug 5, 2015 at 4:49 PM, Sandeep Giri wrote: > Yes, but in the take() approach we will be bringing the d

[no subject]

2015-08-05 Thread Sandeep Giri

Yes, but in the take() approach we will be bringing the data to the driver and is no longer distributed. Also, the take() takes only count as argument which means that every time we would transferring the redundant elements. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.

Re: New Feature Request

2015-08-05 Thread Sean Owen

I don't think countApprox is appropriate here unless approximation is OK. But more generally, counting everything matching a filter requires applying the filter to the whole data set, which seems like the thing to be avoided here. The take approach is better since it would stop after finding n mat

Re: New Feature Request

2015-08-05 Thread Sandeep Giri

Hi Jonathan, Does that guarantee a result? I do not see that it is really optimized. Hi Carsten, How does the following code work: data.filter(qualifying_function).take(n).count() >= n Also, as per my understanding, in both the approaches you mentioned the qualifying function will be execute

Re: Avoiding unnecessary build changes until tests are in better shape

Avoiding unnecessary build changes until tests are in better shape

Re:

Re:

Re: Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'

Re:

Re:

[no subject]

Re: New Feature Request

Re: New Feature Request

10 matches

Site Navigation

Mail list logo

Footer information