[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Payne resolved HADOOP-17346. --------------------------------- Fix Version/s: 3.2.3 3.1.5 3.4.0 3.3.1 Resolution: Fixed Thanks [~ahussein]. I committed to branch-3.1, branch-3.2, branch-3.3, and trunk. > Fair call queue is defeated by abusive service principals > --------------------------------------------------------- > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HADOOP-17346-branch-3.1.001.patch, > HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org