Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/21616 )
Change subject: IMPALA-12345: Add user quotas to Admission Control ...................................................................... Patch Set 11: (9 comments) Thanks for the comments http://gerrit.cloudera.org:8080/#/c/21616/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21616/10//COMMIT_MSG@17 PS10, Line 17: TPoolStats (the data that is shared between Admission Control instances) : is extended to include a map from user name to a count of queries : running. > How scaleable is this map if there are hundreds of users active? Any memory I imagine that the number of users that can concurrently have queries in an Impala system must be of the order of 1000? The memory used per topic would be approx user_name_length + 8 bytes per user so this should not be significant. As to scalability of the map it's a std::map so that should be OK at these sort of numbers. http://gerrit.cloudera.org:8080/#/c/21616/10/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/21616/10/be/src/scheduling/admission-controller.h@586 PS10, Line 586: /// Export user names to a metrics set. : void export_users(SetMetric<std::string>* metrics) const; > Is this scalable if there are hundreds of users? First of all, if user quotas are not configured then there is only a trivial increase in any processing as the user load maps will not be populated. I tried to evaluate the processing code user quotas are configured in two ways. Firstly running microbenchmarks (using the admission-controller-test code), I tested the time for the StatestoreSubscriber callback method UpdatePoolStats(). Running this method with a simulated 10,000 users took less than 20 ms. Secondly using the mini cluster I ran 50 trivial queries with 50 different users. I ran this load on 2 configurations, one with user quotas enabled and one without, and looked at statestore-subscriber.topic-impala-request-queue.processing-time-s which is the elapsed time running UpdatePoolStats(). The maximum time of processing increased from ~15 ms to ~27 ms. Note that this processing is not part of the query execution path. I assert that although there is some processing cost it should not impact scalability with the number of concurrent users that we would expect in an Impala system. http://gerrit.cloudera.org:8080/#/c/21616/10/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/21616/10/common/thrift/metrics.json@293 PS10, Line 293: "description": "Resource Pool $0 Aggregate Users", > Is it important to show all currently active user names? Is displaying dist I found it useful for debugging to see the list, plus it provides some observability for testing. http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java File fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java: http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java@135 PS10, Line 135: public static byte[] getHadoopGroupsInternal(byte[] serializedRequest) > This is duplicate with exact code in JniFrontend.java Done http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java File fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java: http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java@41 PS10, Line 41: import org.junit.Test; : import org.junit.rules.TemporaryFolder; : : import static java.util.Arrays.asList; : import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTH_TO_LOCAL; : import static org.apache.impala.yarn.server.resourcemanager.scheduler.fair. : AllocationFileLoaderService.addQueryLimits; : : import org.apache.hadoop.conf.Configuration; : import org.apache.hadoop.util.StringUtils; : > Unnecessary import reordering? OK I redid the imports to avoid clang-format's reorderng http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java@207 PS10, Line 207: String expectedMessage = "Failed to resolve user 'userA' to a pool > nit: I think existing tests that verify poolService_.hasAccess() can be imp Done, assuming I understand what you mean http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml File fe/src/test/resources/fair-scheduler-test.xml: http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml@3 PS10, Line 3: <queue name="root"> : <userQueryLimit> : <user>userB</user> : <totalCount>6</totalCount> : </userQueryLimit> : <userQueryLimit> : <user>*</user> : <totalCount>10</totalCount> : </userQueryLimit> : <groupQueryLimit> : <group>group3</group> : <totalCount>5</totalCount> : </groupQueryLimit> > nit: Discussion. When there are limits placed at the root then the quotas are enforced across the pools. So here userB is limited to running 6 queries concurrently across all the pools. So this is more about aggregating across pools than about the the non-default pool. The "Max Running Queries" and "Max Queued Queries" are useful for admins who don't want to enforce per-user quotas. http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml@27 PS10, Line 27: <queue name="queueD"> > What will happen if there is per-pool limit set in llama config? The llama.am.throttling.maximum.placed.reservations.* rules are enforced. There is a case in the UserAndGroupQuotas test in admission-controller-test.cc http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml@28 PS10, Line 28: userA,userB > This use comma delimiter, but L21 use space delimiter. If I read the code right, this aclSubmitApps is parsed using Hadoop code https://github.com/apache/hadoop/blob/4e6432a0abf2865853b05eceb61f68ebf3512c47/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/AccessControlList.java#L118 I did not change the parsing or handling of aclSubmitApps, I just added a test. My interpretation of the docs is that there is a list of comma separated users, followed by a whitespace, followed by a comma separated list of groups. As aclSubmitApps is already in use in the wild, it would be hard to change, and probably beyond the scope of this change -- To view, visit http://gerrit.cloudera.org:8080/21616 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c33f3f2427db57fb9b6c593a4b22d5029549b41 Gerrit-Change-Number: 21616 Gerrit-PatchSet: 11 Gerrit-Owner: Andrew Sherman <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Mon, 14 Oct 2024 16:55:59 +0000 Gerrit-HasComments: Yes
