Can you elaborate? Not 100% sure if I understand what you mean. On Thu, Nov 20, 2014 at 7:14 PM, Shixiong Zhu <zsxw...@gmail.com> wrote:
> Is it possible that Spark buffers the messages > of mapOutputStatuses(Array[Byte]) according to the size > of mapOutputStatuses which have already sent but not yet ACKed? The buffer > will be cheap since the mapOutputStatuses messages are same and the memory > cost is only a few pointers. > > Best Regards, > Shixiong Zhu > > 2014-09-20 16:24 GMT+08:00 Reynold Xin <r...@databricks.com>: > >> BTW - a partial solution here: https://github.com/apache/spark/pull/2470 >> >> This doesn't address the 0 size block problem yet, but makes my large job >> on hundreds of terabytes of data much more reliable. >> >> >> On Fri, Jul 4, 2014 at 2:28 AM, Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >> > In our clusters, number of containers we can get is high but memory >> > per container is low : which is why avg_nodes_not_hosting data is >> > rarely zero for ML tasks :-) >> > >> > To update - to unblock our current implementation efforts, we went >> > with broadcast - since it is intutively easier and minimal change; and >> > compress the array as bytes in TaskResult. >> > This is then stored in disk backed maps - to remove memory pressure on >> > master and workers (else MapOutputTracker becomes a memory hog). >> > >> > But I agree, compressed bitmap to represent 'large' blocks (anything >> > larger that maxBytesInFlight actually) and probably existing to track >> > non zero should be fine (we should not really track zero output for >> > reducer - just waste of space). >> > >> > >> > Regards, >> > Mridul >> > >> > On Fri, Jul 4, 2014 at 3:43 AM, Reynold Xin <r...@databricks.com> >> wrote: >> > > Note that in my original proposal, I was suggesting we could track >> > whether >> > > block size = 0 using a compressed bitmap. That way we can still avoid >> > > requests for zero-sized blocks. >> > > >> > > >> > > >> > > On Thu, Jul 3, 2014 at 3:12 PM, Reynold Xin <r...@databricks.com> >> wrote: >> > > >> > >> Yes, that number is likely == 0 in any real workload ... >> > >> >> > >> >> > >> On Thu, Jul 3, 2014 at 8:01 AM, Mridul Muralidharan < >> mri...@gmail.com> >> > >> wrote: >> > >> >> > >>> On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin <r...@databricks.com> >> > wrote: >> > >>> > On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan < >> > mri...@gmail.com> >> > >>> > wrote: >> > >>> > >> > >>> >> >> > >>> >> > >> > >>> >> > The other thing we do need is the location of blocks. This is >> > >>> actually >> > >>> >> just >> > >>> >> > O(n) because we just need to know where the map was run. >> > >>> >> >> > >>> >> For well partitioned data, wont this not involve a lot of >> unwanted >> > >>> >> requests to nodes which are not hosting data for a reducer (and >> lack >> > >>> >> of ability to throttle). >> > >>> >> >> > >>> > >> > >>> > Was that a question? (I'm guessing it is). What do you mean >> exactly? >> > >>> >> > >>> >> > >>> I was not sure if I understood the proposal correctly - hence the >> > >>> query : if I understood it right - the number of wasted requests >> goes >> > >>> up by num_reducers * avg_nodes_not_hosting data. >> > >>> >> > >>> Ofcourse, if avg_nodes_not_hosting data == 0, then we are fine ! >> > >>> >> > >>> Regards, >> > >>> Mridul >> > >>> >> > >> >> > >> >> > >> > >