[Impala-ASF-CR] IMPALA-12345: Add user quotas to Admission Control

Andrew Sherman (Code Review) Mon, 14 Oct 2024 10:01:33 -0700

Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21616 )


Change subject: IMPALA-12345: Add user quotas to Admission Control
......................................................................


Patch Set 11:

(9 comments)

Thanks for the comments

http://gerrit.cloudera.org:8080/#/c/21616/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21616/10//COMMIT_MSG@17
PS10, Line 17: TPoolStats (the data that is shared between Admission Control 
instances)
             : is extended to include a map from user name to a count of queries
             : running.
> How scaleable is this map if there are hundreds of users active? Any memory
I imagine that the number of users that can concurrently have queries in an 
Impala system must be of the order of 1000? The memory used per topic would be 
approx user_name_length + 8 bytes per user so this should not be significant. 
As to scalability of the map it's a std::map so that should be OK at these sort 
of numbers.


http://gerrit.cloudera.org:8080/#/c/21616/10/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/21616/10/be/src/scheduling/admission-controller.h@586
PS10, Line 586:     /// Export user names to a metrics set.
              :     void export_users(SetMetric<std::string>* metrics) const;
> Is this scalable if there are hundreds of users?
First of all, if user quotas are not configured then there is only a trivial 
increase in any processing as the user load  maps will not be populated.
I tried to evaluate the processing code user quotas are configured in two ways.
Firstly running microbenchmarks (using the admission-controller-test code), I 
tested the time for the StatestoreSubscriber callback method UpdatePoolStats().
Running this method with a simulated 10,000 users took less than 20 ms.
Secondly using the mini cluster I ran 50 trivial queries with 50 different 
users.
I ran this load on 2 configurations, one with user quotas enabled and one 
without, and looked at 
statestore-subscriber.topic-impala-request-queue.processing-time-s which is the 
elapsed time running UpdatePoolStats(). The maximum time of processing 
increased from ~15 ms to ~27 ms. Note that this processing is not part of the 
query execution path.
I assert that although there is some processing cost it should not impact 
scalability with the number of concurrent users  that we would expect in an 
Impala system.


http://gerrit.cloudera.org:8080/#/c/21616/10/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/21616/10/common/thrift/metrics.json@293
PS10, Line 293:     "description": "Resource Pool $0 Aggregate Users",
> Is it important to show all currently active user names? Is displaying dist
I found it useful for debugging to see the list, plus it provides some 
observability for testing.


http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java
File fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java:

http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java@135
PS10, Line 135:   public static byte[] getHadoopGroupsInternal(byte[] 
serializedRequest)
> This is duplicate with exact code in JniFrontend.java
Done


http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
File fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java:

http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java@41
PS10, Line 41: import org.junit.Test;
             : import org.junit.rules.TemporaryFolder;
             :
             : import static java.util.Arrays.asList;
             : import static 
org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTH_TO_LOCAL;
             : import static 
org.apache.impala.yarn.server.resourcemanager.scheduler.fair.
             :     AllocationFileLoaderService.addQueryLimits;
             :
             : import org.apache.hadoop.conf.Configuration;
             : import org.apache.hadoop.util.StringUtils;
             :
> Unnecessary import reordering?
OK I redid the imports to avoid clang-format's reorderng


http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java@207
PS10, Line 207: String expectedMessage = "Failed to resolve user 'userA' to a 
pool
> nit: I think existing tests that verify poolService_.hasAccess() can be imp
Done, assuming I understand what you mean


http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml
File fe/src/test/resources/fair-scheduler-test.xml:

http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml@3
PS10, Line 3:   <queue name="root">
            :     <userQueryLimit>
            :       <user>userB</user>
            :       <totalCount>6</totalCount>
            :     </userQueryLimit>
            :     <userQueryLimit>
            :       <user>*</user>
            :       <totalCount>10</totalCount>
            :     </userQueryLimit>
            :     <groupQueryLimit>
            :       <group>group3</group>
            :       <totalCount>5</totalCount>
            :     </groupQueryLimit>
> nit: Discussion.
When there are limits placed at the root then the quotas are enforced across 
the pools. So here userB is limited to running 6 queries concurrently across 
all the pools. So this is more about aggregating across pools than about the 
the non-default pool.

The "Max Running Queries" and "Max Queued Queries" are useful for admins who 
don't want to enforce per-user quotas.


http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml@27
PS10, Line 27:     <queue name="queueD">
> What will happen if there is per-pool limit set in llama config?
The llama.am.throttling.maximum.placed.reservations.* rules are enforced. There 
is a case in the UserAndGroupQuotas test in admission-controller-test.cc


http://gerrit.cloudera.org:8080/#/c/21616/10/fe/src/test/resources/fair-scheduler-test.xml@28
PS10, Line 28: userA,userB
> This use comma delimiter, but L21 use space delimiter.
If I read the code right, this aclSubmitApps is parsed using Hadoop code 
https://github.com/apache/hadoop/blob/4e6432a0abf2865853b05eceb61f68ebf3512c47/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/AccessControlList.java#L118
I did not change the parsing or handling of aclSubmitApps, I just added a test.
My interpretation of the docs is that there is a list of comma separated users, 
followed by a whitespace, followed by a comma separated list of groups.
As aclSubmitApps is already in use in the wild, it would be hard to change, and 
probably beyond the scope of this change



--
To view, visit http://gerrit.cloudera.org:8080/21616
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c33f3f2427db57fb9b6c593a4b22d5029549b41
Gerrit-Change-Number: 21616
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Mon, 14 Oct 2024 16:55:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12345: Add user quotas to Admission Control

Reply via email to