Re: Task failures and other problems

Jörn Franke Thu, 09 Nov 2017 15:32:17 -0800

Sorry I thought with infiniband it was their appliance :)


> On 9. Nov 2017, at 23:38, Vadim Semenov <[email protected]> wrote:
> 
> Probably not Oracle but Cloudera 🙂
> 
> Jan, I think your DataNodes might be overloaded, I'd suggest reducing 
> `spark.executor.cores` if you run executors alongside DataNodes, so the 
> DataNode process would get some resources.
> 
> The other thing you can do is to increase `dfs.client.socket-timeout` in 
> hadoopConf,
> I see that it's set to 120000 in your case right now
> 
>> On Thu, Nov 9, 2017 at 4:28 PM, Jan-Hendrik Zab <[email protected]> wrote:
>> 
>> Jörn Franke <[email protected]> writes:
>> 
>> > Maybe contact Oracle support?
>> 
>> Something like that would be the last option I guess, university money
>> is usually hard to come by for such things.
>> 
>> > Do you have maybe accidentally configured some firewall rules? Routing
>> > issues? Maybe only one of the nodes...
>> 
>> All systems are in the same /16, the nodes don't even have a firewall
>> and the two masters allow everything from the nodes and masters via the
>> infiniband devices.
>> 
>> And as I said, mapred jobs work fine and I haven't seen one network
>> problem so far except for these messages.
>> 
>> Best,
>>         -jhz
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>> 
>

Re: Task failures and other problems

Reply via email to