Hi Wei Chiu, We also observed same issue when NN replays large editlogs from JN. It looks like in jetty 6 the default max idle timeout is 200 seconds.
public abstract class AbstractConnector extends AbstractBuffers implements Connector { .... protected int _maxIdleTime=200000; .... } Thanks, Jason On 12/7/20, 9:51 PM, "Wei-Chiu Chuang" <weic...@apache.org> wrote: Hi community, I want to share with you this observation. We received several case reports that users sometimes experience JournalNode timeout when NN requests edits from JN. The end result is (both!) NN crash after the timeout (10 seconds). It seems to only happen to Hadoop 3 users (CDH6 and HDP3). While HADOOP-15696 <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D15696&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=L67rN1m5wT8nsi0reG7VuHuSEiJ0khiFAjDFK3GFFbQ&s=eEUnJdQK8HKIlsWlNRMzmhQs4DqKn8SFs4X4s2xIENs&e= > offered a configurable switch for you to increase hadoop.http.idle_timeout.ms, it looks like a regression in Hadoop 3 and NN shouldn't simply crash because JN is slightly slow. It looks to me a 10 second timeout for fetching edits from JN is simply too low. I believe this is a regression caused when we updated Jetty from 6 to 9 in Hadoop 3 (HADOOP-10075 <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D10075&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=L67rN1m5wT8nsi0reG7VuHuSEiJ0khiFAjDFK3GFFbQ&s=D_Tma-NaItInfNfm3UuoQbndqB4541VxEeyXpkYMkH4&e= >). We replaced SelectChannelConnector.setLowResourceMaxIdleTime() with ServerConnector.setIdleTimeout() but they aren't the same. https://urldefense.proofpoint.com/v2/url?u=http-3A__archive.eclipse.org_jetty_7.0.0.RC0_apidocs_org_eclipse_jetty_server_nio_SelectChannelConnector.html-23getLowResourcesMaxIdleTime-28-29&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=L67rN1m5wT8nsi0reG7VuHuSEiJ0khiFAjDFK3GFFbQ&s=PcA6g7BGB_1fGEHHCS1Dgl0i4fS_AeCRr1q5ceVduOo&e= https://urldefense.proofpoint.com/v2/url?u=https-3A__www.eclipse.org_jetty_javadoc_9.4.26.v20200117_org_eclipse_jetty_server_AbstractConnector.html-23setIdleTimeout-28long-29&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=L67rN1m5wT8nsi0reG7VuHuSEiJ0khiFAjDFK3GFFbQ&s=FKfElxhHXM1PCAk0VpG9wt6Y6jyKbr-PN4H4v4m9Tfc&e= Does any know the behavior back in Hadoop 2/Jetty6? Does it use the Jetty's default idle time which is 300 seconds?