Re: TCP/IP speedup

Steve Loughran Sun, 02 Aug 2015 17:08:20 -0700

On 1 Aug 2015, at 18:26, Ruslan Dautkhanov 
<dautkha...@gmail.com<mailto:dautkha...@gmail.com>> wrote:


If your network is bandwidth-bound, you'll see setting jumbo frames (MTU 9000)
may increase bandwidth up to ~20%.

http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
"Enabling Jumbo Frames across the cluster improves bandwidth"

+1

you can also get better checksums of packets, so that the (very small but 
non-zero) risk of corrupted network packets drops a bit more.


If Spark workload is not network bandwidth-bound, I can see it'll be a few 
percent to no improvement.



Put differently: it shouldn't hurt. The shuffle phase is the most network 
heavy, especially as it can span the entire cluster that backbone bandwidth 
"bisection bandwidth" can become the bottleneck, and mean that jobs can 
interfere

scheduling of work close to the HDFS data means that HDFS reads should often be 
local (the TCP stack gets bypassed entirely), or at least rack-local (sharing 
the switch, not any backbone)


but there's other things there, as the slide talks about


-stragglers: often a sign of pending HDD failure, as reads are retries. the 
classic hadoop MR engine detects these, can spin up alternate mappers (if you 
enable speculation), and will blacklist the node for further work. Sometimes 
though that straggling is just unbalanced data -some bits of work may be 
computationally a lot harder, slowing things down.

-contention for work on the nodes. In YARN you request how many "virtual cores" 
you want (ops get to define the map of virtual to physical), with each node 
having a finite set of cores

but ...
  -Unless CPU throttling is turned on, competing processes can take up more CPU 
than they asked for.
  -that virtual:physical core setting may be of

There's also disk IOP contention; two jobs trying to get at the same spindle, 
even though there are lots of disks on the server. There's not much you can do 
about that (today).

A key takeaway from that talk, which applies to all work-tuning talks is: get 
data from your real workloads, There's some good htrace instrumentation in HDFS 
these days, I haven't looked @ spark's instrumentation to see how they hook up. 
You can also expect to have some network monitoring (sflow, ...) which you 
could use to see if the backbone is overloaded. Don't forget the Linux tooling 
either, iotop &c. There's lots of room to play here -once you've got the data 
you can see where to focus, then decide how much time to spend trying to tune 
it.

-steve


--
Ruslan Dautkhanov

On Sat, Aug 1, 2015 at 6:08 PM, Simon Edelhaus 
<edel...@gmail.com<mailto:edel...@gmail.com>> wrote:
Hmmmm....

2% huh.


-- ttfn
Simon Edelhaus
California 2015

On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra 
<m...@clearstorydata.com<mailto:m...@clearstorydata.com>> wrote:
https://spark-summit.org/2015/events/making-sense-of-spark-performance/

On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus 
<edel...@gmail.com<mailto:edel...@gmail.com>> wrote:
Hi All!

How important would be a significant performance improvement to TCP/IP itself, 
in terms of
overall job performance improvement. Which part would be most significantly 
accelerated?
Would it be HDFS?

-- ttfn
Simon Edelhaus
California 2015

Re: TCP/IP speedup

Reply via email to