Re:

Feynman Liang Wed, 05 Aug 2015 10:04:17 -0700

qualifying_function() will be executed on each partition in parallel;
stopping all parallel execution after the first instance satisfying
qualifying_function() would mean that you would have to effectively make
the computation sequential.


On Wed, Aug 5, 2015 at 9:05 AM, Sandeep Giri <sand...@knowbigdata.com>
wrote:

> Okay. I think I got it now. Yes take() does not need to be called more
> than once. I got the impression that we wanted to bring elements to the
> driver node and then run out qualifying_function on driver_node.
>
> Now, I am back to my question which I started with: Could there be an
> approach where the qualifying_function() does not get called after an
> element has been found?
>
>
> Regards,
> Sandeep Giri,
> +1 347 781 4573 (US)
> +91-953-899-8962 (IN)
>
> www.KnowBigData.com. <http://KnowBigData.com.>
> Phone: +1-253-397-1945 (Office)
>
> [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image:
> other site icon] <http://knowbigdata.com>  [image: facebook icon]
> <https://facebook.com/knowbigdata> [image: twitter icon]
> <https://twitter.com/IKnowBigData> <https://twitter.com/IKnowBigData>
>
>
> On Wed, Aug 5, 2015 at 9:21 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> take only brings n elements to the driver, which is probably still a win
>> if n is small. I'm not sure what you mean by only taking a count argument
>> -- what else would be an arg to take?
>>
>> On Wed, Aug 5, 2015 at 4:49 PM, Sandeep Giri <sand...@knowbigdata.com>
>> wrote:
>>
>>> Yes, but in the take() approach we will be bringing the data to the
>>> driver and is no longer distributed.
>>>
>>> Also, the take() takes only count as argument which means that every
>>> time we would transferring the redundant elements.
>>>
>>>
>

Re:

Reply via email to