[ 
https://issues.apache.org/jira/browse/KUDU-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-3212:
--------------------------------
    Description: 
Current implementation of location assignment has some room for improvement.  
As of now, the following is understood:

# Implementation-wise, Kudu masters could use newly introduced 
[Subprocess|https://github.com/apache/kudu/tree/master/src/kudu/subprocess] 
functionality to run location assignment script.  That would be more robust 
than using current fork/exec approach to run the script, especially for larger 
deployments where Kudu masters might have high request-per-second ratio (many 
active threads running, a lot of memory allocated, etc.)
# Conceptually, Kudu tablet servers could have all the necessary information 
regarding their  location at startup and that information isn't going to change 
while tablet server is running. The server/machine they are running at is 
provisioned to be in some rack, availability zone, data center, etc.  and that 
assignment isn't changing while the server is up and running.  So, a Kudu 
tablet server can be provided with information about its location upon startup; 
there is no need to consult Kudu master about this.
# Conceptually, Kudu clients might be aware of their location as well.

To address item 1, it's necessary to update current implementation of location 
assignment, so the script should be run by a dedicated subprocess forked off 
earlier during master's startup.  Ideally, to make it more robust, the 
subprocess server can run the location assignment script as a small server that 
takes an IP or DNS name on input and provides location label on the output, 
maybe line-by-line.  The latter assumes chaning the requirement for a location 
assignment script, and probably we should introduce a separate flag to specify 
the path to a script that is running in such mode.  However, even with current 
location assignment approach when it's necessary to run a script per every 
location assignment request, using the {{Subprocess}} functionality would 
benefit larger deployments where fork/exec sequence for a {{kudu-master}} 
process is slow and inefficient.

To address item 2, it's necessary to introduce a new tablet server's flag that 
is set to the assigned location for the tablet server.  The systemd/init.d 
startup script for kudu-tserver should populate the flag with proper value.  
It's also necessary to introduce a new field in the {{TSHeartbeatRequestPB}} 
message to pass the location from tablet server to master.  If master sees the 
field populated, it should not run the location assignment script, even if the 
location assignment script is set specified (i.e. {{\-\-location_mapping_cmd}} 
flag is set).  This way it would be possible to perform rolling upgrades from 
older versions which use centrally managed location assignment script to the 
version that implements the new approach.

To address item 3, it's necessary to find a means to specify location for a 
Kudu client.  Probably, an environment variable can be used for that.   The 
{{ConnectToMasterRequestPB}} can be extended to include an optional 
{{client_location}} field.  In addition, if 
{{\-\-master_client_location_assignment_enabled}} is set to {{true}}, master 
could run the location assignment script to assign location to a client which 
doesn't populate the newly introduced 
{{ConnectToMasterRequestPB::client_location}} field.

  was:
Current implementation of location assignment has some room for improvement.  
As of now, the following is understood:

# Implementation-wise, Kudu masters could use newly introduced 
[Subprocess|https://github.com/apache/kudu/tree/master/src/kudu/subprocess] 
functionality to run location assignment script.  That would be more robust 
than using current fork/exec approach to run the script, especially for larger 
deployments where Kudu masters might have high request-per-second ratio (many 
active threads running, a lot of memory allocated, etc.)
# Conceptually, Kudu tablet servers could have all the necessary information 
regarding their  location at startup and that information isn't going to change 
while tablet server is running. The server/machine they are running at is 
provisioned to be in some rack, availability zone, data center, etc.  and that 
assignment isn't changing while the server is up and running.  So, a Kudu 
tablet server can be provided with information about its location upon startup; 
there is no need to consult Kudu master about this.
# Conceptually, Kudu clients might be aware of their location as well.

To address item 1, it's necessary to update current implementation of location 
assignment, so the script should be run by a dedicated subprocess forked off 
earlier during master's startup.  Ideally, to make it more robust, the 
subprocess server can run the location assignment script as a small server that 
takes an IP or DNS name on input and provides location label on the output, 
maybe line-by-line.  The latter assumes chaning the requirement for a location 
assignment script, and probably we should introduce a separate flag to to 
specify the path to a script that is capable running in such mode.  However, 
even with current location assignment approach when it's necessary to run a 
script per location assignment request, using the {{Subprocess}} functionality 
would benefit larger deployments where forking a kudu-master process might be 
slow inefficient.

To address item 2, it's necessary to introduce a new tablet server's flag that 
is set to the assigned location for the tablet server.  The systemd/init.d 
startup script for kudu-tserver should populate the flag with proper value.  
It's also necessary to introduce a new field in the {{TSHeartbeatRequestPB}} 
message to pass the location from tablet server to master.  If master sees that 
field populated, it should not run the location assignment script, even if it's 
specified.  This way it would be possible to perform rolling upgrades from 
older versions which use centrally managed location assignment script to the 
version that implements the new approach.

To address item 3, it's necessary to find a means to specify location for a 
Kudu client.  Probably, an environment variable can be used for that.   The 
{{ConnectToMasterRequestPB}} can be extended to include an optional 
{{client_location}} field.  In addition, if 
{{\-\-master_client_location_assignment_enabled}} is set to {{true}}, master 
could run the location assignment script to assign a location to a client which 
doesn't populate the newly introduced 
{{ConnectToMasterRequestPB::client_location}} field.


> Location assignment improvements
> --------------------------------
>
>                 Key: KUDU-3212
>                 URL: https://issues.apache.org/jira/browse/KUDU-3212
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, master, tserver
>    Affects Versions: 1.13.0
>            Reporter: Alexey Serbin
>            Priority: Major
>              Labels: performance, scalability
>
> Current implementation of location assignment has some room for improvement.  
> As of now, the following is understood:
> # Implementation-wise, Kudu masters could use newly introduced 
> [Subprocess|https://github.com/apache/kudu/tree/master/src/kudu/subprocess] 
> functionality to run location assignment script.  That would be more robust 
> than using current fork/exec approach to run the script, especially for 
> larger deployments where Kudu masters might have high request-per-second 
> ratio (many active threads running, a lot of memory allocated, etc.)
> # Conceptually, Kudu tablet servers could have all the necessary information 
> regarding their  location at startup and that information isn't going to 
> change while tablet server is running. The server/machine they are running at 
> is provisioned to be in some rack, availability zone, data center, etc.  and 
> that assignment isn't changing while the server is up and running.  So, a 
> Kudu tablet server can be provided with information about its location upon 
> startup; there is no need to consult Kudu master about this.
> # Conceptually, Kudu clients might be aware of their location as well.
> To address item 1, it's necessary to update current implementation of 
> location assignment, so the script should be run by a dedicated subprocess 
> forked off earlier during master's startup.  Ideally, to make it more robust, 
> the subprocess server can run the location assignment script as a small 
> server that takes an IP or DNS name on input and provides location label on 
> the output, maybe line-by-line.  The latter assumes chaning the requirement 
> for a location assignment script, and probably we should introduce a separate 
> flag to specify the path to a script that is running in such mode.  However, 
> even with current location assignment approach when it's necessary to run a 
> script per every location assignment request, using the {{Subprocess}} 
> functionality would benefit larger deployments where fork/exec sequence for a 
> {{kudu-master}} process is slow and inefficient.
> To address item 2, it's necessary to introduce a new tablet server's flag 
> that is set to the assigned location for the tablet server.  The 
> systemd/init.d startup script for kudu-tserver should populate the flag with 
> proper value.  It's also necessary to introduce a new field in the 
> {{TSHeartbeatRequestPB}} message to pass the location from tablet server to 
> master.  If master sees the field populated, it should not run the location 
> assignment script, even if the location assignment script is set specified 
> (i.e. {{\-\-location_mapping_cmd}} flag is set).  This way it would be 
> possible to perform rolling upgrades from older versions which use centrally 
> managed location assignment script to the version that implements the new 
> approach.
> To address item 3, it's necessary to find a means to specify location for a 
> Kudu client.  Probably, an environment variable can be used for that.   The 
> {{ConnectToMasterRequestPB}} can be extended to include an optional 
> {{client_location}} field.  In addition, if 
> {{\-\-master_client_location_assignment_enabled}} is set to {{true}}, master 
> could run the location assignment script to assign location to a client which 
> doesn't populate the newly introduced 
> {{ConnectToMasterRequestPB::client_location}} field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to