Sorry I thought with infiniband it was their appliance :)
> On 9. Nov 2017, at 23:38, Vadim Semenov <vadim.seme...@datadoghq.com> wrote: > > Probably not Oracle but Cloudera 🙂 > > Jan, I think your DataNodes might be overloaded, I'd suggest reducing > `spark.executor.cores` if you run executors alongside DataNodes, so the > DataNode process would get some resources. > > The other thing you can do is to increase `dfs.client.socket-timeout` in > hadoopConf, > I see that it's set to 120000 in your case right now > >> On Thu, Nov 9, 2017 at 4:28 PM, Jan-Hendrik Zab <z...@l3s.de> wrote: >> >> Jörn Franke <jornfra...@gmail.com> writes: >> >> > Maybe contact Oracle support? >> >> Something like that would be the last option I guess, university money >> is usually hard to come by for such things. >> >> > Do you have maybe accidentally configured some firewall rules? Routing >> > issues? Maybe only one of the nodes... >> >> All systems are in the same /16, the nodes don't even have a firewall >> and the two masters allow everything from the nodes and masters via the >> infiniband devices. >> >> And as I said, mapred jobs work fine and I haven't seen one network >> problem so far except for these messages. >> >> Best, >> -jhz >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >