Yes, that number is likely == 0 in any real workload ...

On Thu, Jul 3, 2014 at 8:01 AM, Mridul Muralidharan <mri...@gmail.com>
wrote:

> On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin <r...@databricks.com> wrote:
> > On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan <mri...@gmail.com>
> > wrote:
> >
> >>
> >> >
> >> > The other thing we do need is the location of blocks. This is actually
> >> just
> >> > O(n) because we just need to know where the map was run.
> >>
> >> For well partitioned data, wont this not involve a lot of unwanted
> >> requests to nodes which are not hosting data for a reducer (and lack
> >> of ability to throttle).
> >>
> >
> > Was that a question? (I'm guessing it is). What do you mean exactly?
>
>
> I was not sure if I understood the proposal correctly - hence the
> query : if I understood it right - the number of wasted requests goes
> up by num_reducers * avg_nodes_not_hosting data.
>
> Ofcourse, if avg_nodes_not_hosting data == 0, then we are fine !
>
> Regards,
> Mridul
>

Reply via email to